Submitted by admin on Mon, 06/10/2024 - 05:00
Motivated by DNA-based applications, we generalize the bee identification problem proposed by Tandon et al. (2019). In this setup, we transmit all $M$ codewords from a codebook over some channel and each codeword results in $N$ noisy outputs. Then our task is to identify each codeword from this unordered set of $MN$ noisy outputs. First, via a reduction to a minimum-cost flow problem on a related bipartite flow network called the input-output flow network, we show that the problem can be solved in $O(M^{3})$ time in the worst case. Next, we consider the deletion and the insertion channels individually, and in both cases, we study the expected number of edges in their respective input-output networks. Specifically, we obtain closed expressions for this quantity for certain codebooks and when the codebook comprises all binary words, we show that this quantity is sub-quadratic when the deletion or insertion probability is less than 1/2. This then implies that the expected running time to perform joint decoding for this codebook is $o(M^{3})$ . For other codebooks, we develop methods to compute the expected number of edges efficiently. Finally, we adapt classical peeling-decoding techniques to reduce the number of nodes and edges in the input-output flow network.
Johan Chrisnata
Han Mao Kiah
Alexander Vardy
Eitan Yaakobi