Efficient graph mining techniques for learning structure-activity relationships
Leander Schietgat (GREYC, Caen)Graph mining is concerned with learning techniques that have graphs as input. In this work, we focus on the application of learning structure-activity relationships (SAR), in which the goal is to predict properties of molecules based on their atom-bond structure. Because graphs provide a natural representation for molecules, they have become very popular in the context of SAR: each vertex of the graph can represent an atom, while an edge represents a molecular bond. However, algorithms on graphs that aim at using all available structural information often involve the matching of subgraphs or other combinatorial operations trying to align graphs optimally.
We propose efficient graph mining algorithms that can be used for SAR learning tasks. First, we introduce an algorithm that computes a maximum common subgraph (MCS) between a pair of graphs in polynomial time thanks to the restriction to outerplanar graphs and the block-and-bridge-preserving subgraph isomorphism. This algorithm will be used in two learning techniques for SAR.
On the one hand, we will show that a metric can be constructed from the MCS algorithm. Such a metric should ideally fulfil two requirements: (1) it should be efficiently computable, which is important when analysing large molecular databases, and (2) the notion of similarity should discriminate between molecules w.r.t. the activity of interest. Finding such similarity measures is one of the current challenges in chemoinformatics. On the other hand, we will use the MCS algorithm to generate features for graphs. It is important that the features can be generated efficiently and that they preserve the molecular structure as much as possible, while at the same time having a good predictiveness.