From Elastic Degenerate Strings to Full-Text Indexing: Suffix Sorting, BWT, and FM-Index Construction and Search
Francesco Pio Marino (Univ. Catane, Italie)Elastic degenerate strings provide a flexible framework for representing sequences with structured variability, generalizing classical string models while preserving algorithmic tractability. In this talk, we present a comprehensive framework for indexing such objects, culminating in the construction of an FM-index that supports efficient pattern matching.
We begin by revisiting the notion of elastic degeneracy and its algorithmic implications. We then describe how classical building blocks—suffix sorting, the Burrows–Wheeler Transform (BWT), and wavelet trees—can be extended to this richer setting. In particular, we outline a linear-time suffix sorting approach based on an adaptation of the DC3 algorithm, followed by the construction of the BWT and the associated wavelet tree representation.
Finally, we show how these components combine into a full FM-index and describe how the classical backward search procedure can be adapted to this setting, enabling efficient pattern matching over elastic degenerate strings. The resulting framework bridges combinatorial pattern matching and compressed indexing, opening the way to scalable querying in settings where uncertainty and variability are intrinsic.