CSA50 Pratiksha Trust Workshop on Theoretical Computer Science

On the uniqueness/phase-coexistence threshold for the hard-core model

Prasad Tetali

It has long been conjectured that on the square lattice (Z²), the hard lattice gas model has a critical value λ_c = 3.796… with the property: if λ < λ_c, then it exhibits uniqueness of phase, while if λ > λ_c then there is phase coexistence – existence of multiple Gibbs measures.

The speaker will first review the basics of this model of independent interest in combinatorics, probability, statistical physics and theoretical computer science. Then he will give an update on the status of the problem on the square lattice, highlighting recent efforts that have rigorously established that λ_c belongs to the interval [2.538, 5.3506], as well as mentioning other recent work and several open problems.

Determining k

Ravi Kannan

In several problems, a basic parameter k (Eg. the number of topics in Topic Modeling, number of clusters in Clustering, and number of communities in Community Detection) is often assumed to be given. Two issues remain: A data-based definition of k and efficient algorithms to find it.

In the case of Topic Modeling, we show that k is the number of extreme points of a (data-determined) polytope. We then reduce finding k to a combinatorial problem - Approximate Set Inclusion Number (ASIN). Under reasonable assumptions on the topic model, we solve ASIN in nearly linear time. Similar definitions of k work for Clustering.

Joint work with Chiranjib Bhattacharyya, Navin Goyal, Jagdeep Pani and Harsha Simhadri, Mrinal Das

Short Proofs in QBF expansion

Meena Mahajan

When reaching out for an off-the-shelf industrial QBF solver for your application, how would you choose? Knowing something about the underlying proof system could help.

The Resolution proof system for demonstrating unsatisfiability of propositional formulas can be generalised to QBFs (Quantified Boolean Formulas) in two fundamentally different ways. One of these ways is based on conflict-driven clause learning, the other is based on expansion. This talk describes these two approaches, and compares the relative strengths of the resulting systems in both the DAG-like and the tree-like versions.

Parametric shortest paths in planar graphs

Jaikumar Radhakrishnan

For a directed graph G with n vertices, including two vertices s and t, and edge weights that vary linearly with a parameter λ, the cost of the shortest s-t path is a piece-wise linear function of λ. The number of pieces in this function is the parametric shortest path complexity of G. We show that there is a sequence of planar graphs, where the n^th graph has n vertices and parametric complexity n^clogn, for a constant c > 0.

Graphs with similar parametric complexity were constructed by Carstensen (1983) and Mulmuley & Shah (2001) but their graphs were non-planar. Gusfield (1980) showed that the parametric complexity of an n vertex graph is at most n^dlogn, for a constant d > 0, hence these constructions and ours are optimal. Nikolova, Kelner, Brand and Mitzenmacher (2006) conjectured that the parametric complexity of planar graphs is polynomially bounded; our construction refutes this conjecture.

This is joint work with Kshitij Gajjar.

Integral flow in series parallel graphs

Naveen Garg

This talk considers the problem of routing multicommodity flow in a series parallel graphs. It is conjectured that if the Supply+demand graph is Eulerian and the capacity of every cut is at least twice the demand across it then flow can be routed integrally. We show this is true when the graph is a parallel composition of paths. Our algorithm uses ideas from Edmonds algorithm for matching in general graphs to build an augmenting path like algorithm for routing flow.

Minimizing weighted flow-time on a single machine

Amit Kumar

In the weighted flow-time problem on a single machine, we are given a set of n jobs, where each job has a processing requirement, release date and weight. The goal is to find a preemptive schedule which minimizes the sum of weighted flow-time of jobs, where the flow-time of a job is the difference between its completion time and its released date. We give the first pseudo-polynomial time constant approximation algorithm for this problem. The running time of our algorithm is polynomial in n, the number of jobs, and P, which is the ratio of the largest to the smallest processing requirement of a job. Our algorithm relies on a novel reduction of this problem to a generalization of the multi-cut problem on trees, which we call the Demand Multi-Cut problem.

This is joint work with Jatin Batra and Naveen Garg.

List decoding using double samplers

Prahladh Harsha

Samplers are bipartite graphs in which large sets of vertices are expanding. Sampler graphs are used in error-correcting code constructions to amplify the distance of an error correcting code (cf, the ABNNR construction). This construction has a unique decoding algorithm, but it is not known to be efficiently list decodable.

In this talk, I will introduce the notion of "double samplers", which extend sampler graphs to three layers, and show a polynomial time algorithm which list decodes the ABNNR code construction on a double sampler. Double samplers can be constructed from high dimensional expanders, and currently it is the only known construction.

Joint work with Irit Dinur, Tali Kaufman Inbal Livni Navon and Amnon Ta-Shma.

Firefighting with Critical Nodes

Neeldhara Misra

The firefighting game is a turn-based game played on a graph, where the fire spreads to vertices in a breadth-first manner from a source, and firefighters can be placed on yet unburnt vertices on alternate rounds to block the fire. The goal is to come up with a strategy for placing firefighters on nodes in order to intercept the spread of the fire. The most natural algorithmic question associated with this game is to find a strategy that optimizes some desirable criteria, for instance, maximizing the number of saved vertices, minimizing the number of rounds, the number of firefighters per round, or the number of burned vertices, and so on. These questions are well-studied in the literature, and while most variants are NP-hard, approximation and parameterized algorithms have been proposed for various scenarios.
In this talk, we will survey some of the known results and techniques for solving the firefighting problem, with a special focus on the variant where the goal is to save a critical subset of nodes. In this context, we will will draw connections with notions of important separators and tight separator sequences. We will also contemplate on possible relationships that this problem has with other models of information spread on networks.

This is joint work with Jayesh Choudhari, Anirban Dasgupta and M. S. Ramanujan.

Non-malleable Encodings

Bhavana Kanukurthi

Non-malleable Encodings provide the following protection against tampering: a decoded message is either the same or completely independent of the underlying message. They have many applications to cryptography with connections to pseudo-random objects such as extractors, expanders and so on. In this talk, I will survey some of our recent results in this area.

Correlations in Cryptography

Manoj Prabhakaran

The study of correlated random variables is at the heart of fields like statistics and information theory. Correlations also have great significance to cryptography. In this talk we shall explore the fascinating world of correlations through the lens of cryptography and multi-party computation.

Bouquet Algorithms for Robust Query Processing

Vinayaka Pandit

Selectivity estimates for optimizing OLAP queries often differ significantly from those actually encountered during query execution, leading to poor plan choices and inflated response times. Recently, new class of algorithms, called Bouquet Algorithms have been proposed to deal with the problems arising out of errors in selectivity estimates. In this approach, the selectivity estimation process is completely abandoned and replaced with a calibrated discovery mechanism. The beneficial outcome of this new construction is that, for the first time, provable guarantees are obtained on worst-case performance, thereby facilitating robust query processing. The first bouquet algorithm, called Plan Bouquet (PB) -- while providing the breakthrough property of a guaranteed bound on the sub-optimality of the query execution time -- suffered from the fact that the bound was depending on the operating platform of query processing. In this talk, we will present bouquet algorithms that retain the strength of the PB approach while providing the additional benefit that the provable bounds of sub-optimality are independent of the operating platform.

Games are Algorithms

Christos Papadimitriou

Games are mathematical frameworks for predicting the behavior of rational agents in situations of conflict, and their study begun at about the same time as the serious exploration of computer algorithms started -- and the same brilliant mathematician launched both intellectual enterprises. One would have expected that the two disciplines would proceed hand in hand, and algorithmic methods would be employed for predicting the outcome of a game. However, over the past seven decades the central predictive concept of game theory has been that of the Nash equilibrium which, for a variety of reasons that have now become more and more apparent, is inherently non-algorithmic. The field of game dynamics -- modeling through continuous or discrete time dynamics the unfolding of the players' actions as they react to all others -- is explicitly algorithmic, but has been for the most part applied to the fruitless goal of discovering dynamics guaranteed to converge to the Nash equilibrium. We develop a theory, and ensuing solution concepts, that capture the limit behavior of players in any game. The main mathematical tool employed is Conley's fundamental theorem on the topology of dynamical systems. (Joint work with Georgios Piliouras.)

Algorithmic synthesis of complex biomolecules

Mukund Thattai

An algorithm converts inputs to corresponding unique outputs through a sequence of actions. Algorithms are used as metaphors for complex biological processes such as organismal development. Here we make this metaphor rigorous for the synthesis of the branched carbohydrates known as glycans. These molecules play a key role in conveying cellular identity and self / non-self information across all domains of life; for example, the famous A/B blood group antigens are glycans. In eukaryotic cells (including human cells) glycans are synthesized by collections of enzymes in a factory-line-like system of compartments known as a Golgi apparatus. The Golgi can stochastically convert a single input molecule to a heterogeneous set of possible outputs; yet in living cells the observed diversity of the outputs is very small. Here we resolve this paradox by borrowing from the theory of algorithmic self-assembly. We rigorously enumerate the sources of glycan diversity: incomplete glycans via early exit from the reaction compartment; tandem repeat glycans via runaway reactions; and competing glycan fates via divergent reactions. We demonstrate how to diagnose and eliminate each of these, thereby obtaining ``algorithmic compartments" that convert inputs to corresponding unique outputs. Given an input and a target output we either prove that the output cannot be algorithmically synthesized from the input, or explicitly construct a succession of algorithmic Golgi compartments that achieves this synthesis. Our theoretical analysis allows us to infer the causes of non-algorithmic microheterogeneity and species-specific diversity in real glycan datasets.

DNA Sequencing for Disease Diagnosis

Ramesh Hariharan

This is not as much a theory talk, but will describe the application of string algorithms to the use of DNA sequencing for diagnosing genetic disorders.

It describes a recent book of real stories about the search for genomic spelling errors that have stark consequences: infants who pass away mysteriously, siblings with misplaced organs, a family with several instances of vision loss, sisters whose hearts fail in the prime of their youth, a boy whose blood can’t carry enough oxygen, a baby with cancer in the eye, a middle-aged patient battling cancer, and the author’s own color blindness. The search in each case proves to be a detective quest that connects the world of medical practice with that of molecular biology, traversing the world of computer algorithms along the way. These stories come from first-hand experience gained in our genome sequencing diagnostic laboratory in India.