Rare and revealing: A new method for uncovering hidden patterns in data

Monday, August 11, 2025

In disciplines as diverse as public health, economics, genomics, and causal discovery, it’s important to determine dependence patterns in data, which describe how one variable influences another. While researchers have developed methods to do this, current tests often fail to identify dependence in small regions of data where the relationships between variables differ from those in most of the data. This characteristic, termed ‘rare dependence’ in a recent study by researchers from MBZUAI and other institutions, throws off many of today’s data analysis tools.

To address this problem, the researchers developed a new method that has the potential to be used in fields that analyze observational data. By using kernel-based conditional independence testing via sample importance reweighting, it identifies regions in data that exhibit rare dependence and assigns comparatively more importance to them.

The team tested their algorithm on synthetic and real-world data and found that it successfully identified relations between variables even in the presence of rare dependence, illustrating its potential to help researchers gain new insights from data analysis.

Yiqing Li, a research associate at MBZUAI, is a co-author of the study and recently presented the findings at the International Conference on Machine Learning (ICML) in Vancouver. Yewei Xia, Xiaofei Wang, Zhengming Chen, Liuhua Peng, Mingming Gong, and Kun Zhang are co-authors.

What’s difficult about rare dependence?

Rare dependence occurs frequently in real-world datasets and it can also be an issue in AI-related fields like causal inference, feature selection, and self-supervised learning. In medicine, drugs can sometimes have unexpected effects that manifest only in a small subpopulation. For example, opioids have been found to actually make the pain worse in a subset of patients. Current independence tests struggle in these situations.

Li and her colleagues describe their approach as assigning more “attention to the dependent sub-samples, successfully detecting rare dependence.” Data points that are determined to exhibit more significant dependence patterns are automatically given more weight.

The researchers tested their method on two synthetic datasets. They compared their method to several other independence tests, including the Hilbert-Schmidt Independence Criterion (HSIC) which pays attention to all samples in a dataset in the same way.

For both datasets, their method controlled for type I errors, a kind of “false positive,” where a test determines that variables are dependent when they are actually independent. Theirs did so with higher testing power in both cases, which the researchers say confirms the need for reweighting a subset of samples in the case of rare dependence.


A scatter plot of data points generated by the researchers to evaluate independence tests in the presence of rare dependence, indicated by the red box. Traditional independence tests like HSIC evaluate the whole sample, which can produce misleading results.

Li and her co-authors also ran their test on a real-world dataset produced by the US Federal Reserve. The dataset covered the years 1990 to 2010 and included two variables — the exchange rate between the Japanese Yen and the US dollar and the US federal funds rate, which is an indicator of the general health of the economy.

They found that their combined method detected dependence between the two variables with high probability while HSIC failed to reject independence. What’s more, it did so in a way that was interpretable, identifying the years 2001 and 2008, periods of great economic turmoil, as showing rare dependence.

“People want to know why and how economic crises happen and how to prevent them, but the number of data points for financial crises is small,” Li explains. “Normal testing methods fail in these scenarios, and this is one of the areas where our method can provide value.”

Li acknowledged that while the results are promising, there are tradeoffs when it comes to efficiency, as their method must first figure out what samples to focus on. “We have to solve an optimization problem to get the best reweighting function and then amplify the informative subsamples to detect the rare dependence,” she says. “Existing testing methods can test independence directly, which is faster, and testing the whole data can also provide important insights.”

The researchers also derived a conditional independence test version of their method and combined the proposed tests with PC algorithm, a popular method used for causal discovery. The Rare Dependence PC, or RDPC, can correctly learn the equivalent class of the ground-truth causal graphs from the observational data in the presence of rare dependence.

What’s next?

Li said that there are several different directions that the research can go from here. One would be to try to figure out the “exact mechanism that’s behind rare dependence in certain datasets and under what conditions these relationships happen.”

Wherever it goes, she wants to encourage other researchers to find out how this new test might be helpful to their work. “Whenever you encounter a dataset that might contain rare dependence relationships, please try our method.”

Related

thumbnail
Thursday, August 07, 2025

Following in the footsteps of the Godfather

When MBZUAI alumnus Rohit Bharadwaj joined the University of Edinburgh as a Ph.D. candidate, he followed in.....

  1. Alumni Spotlight ,
  2. Ph.D. ,
  3. dataset ,
  4. graduate students ,
  5. alumni ,
  6. privacy ,
Read More
thumbnail
Monday, July 28, 2025

Overcoming the 'reversal curse' in LLMs with ReCall

A team from MBZUAI will present their research on 'self-referencing causal cycles' at ACL, in a bid.....

  1. nlp ,
  2. transformer ,
  3. performance ,
  4. reversal curse ,
  5. ACL ,
  6. language ,
  7. language models ,
  8. large language models ,
  9. machine learning ,
Read More
thumbnail
Friday, July 25, 2025

Composition and code: how Gus Xia is using music and metaphysics to advance AI

Xia’s eclectic approach to AI will be put to good use as he teaches MBZUAI’s debut cohort.....

  1. machine learning ,
  2. robotics ,
  3. faculty ,
  4. undergraduates ,
  5. intelligence ,
  6. music ,
  7. creativity ,
Read More