Random signed SAT

This paper was long in the making, but Kathrin Ballerstein and I have finally completed the final draft of our paper An algorithm for random signed 3-SAT with Intervals. Signed SAT is a "multi-valued logic" version of the classical Satisfiability (SAT) problem. In both cases, one is given as input a set of $n$ variables and a set of $m$ clauses, each consisting of $k$ "literals", and one seeks to find an assignment of values to the variables such that in each clause, at least one literal is satisfied. Whereas in classical SAT the variables are boolean and the literals have the form "$x$" or "$\lnot x$", in signed SAT the literals have the form $x \in S$, where $S$ is a member of some set $\mathcal S$ which is fixed (i.e., not part of the input). Signed SAT has been around for a while. The best starting point for references is probably the paper "The Helly property and satisfiability of Boolean formulas defined on set families" by Chepoi, Creignou, Hermann, and Salzer (it simply is the most recent I know of), but the groundbreaking work was laid by Manyà and Hähnle in the 90s. In our paper, we study a variant where the set $\mathcal S$ comprises all intervals contained in $[0,1]$; we refer to it as iSAT. We study what a proper adaption of the well-known Unit Clause algorithm for classical random SAT does for our interval Satisfiability problem. Since Unit Clause algorithms alone are usually not much good, we enhance it with a "repair" (or "very limited backtracking") option, and analyze the whole thing using Wormald's famous ODE method. Here are some details about the paper.
Algorithms which repair their mistakes
Say you have a smart algorithm, which proceeds through a number, say $\Theta(n)$, of iterations. In each iteration, the probability of it succeeding is quite large. For example, let us say that, in each iteration, the failure probability is $\Theta(r(n)/n)$ for some $r(n) = o(n)$. Due to the large number of iterations, though, the expected number of failures is $\Theta(r(n))$ --- too large even for constant $r$. The remedy is, of course, to let the algorithm check, in each iteration, whether it has made a mistake, and if that is the case, repair the damage. There are certain complications with this approach, mainly because the algorithm has already "looked at" some of the random input data, so that, during the repair, we have to assume that the distribution of some of the data is changed by conditioning on the earlier choices of the algorithm. However, there's a trick in this situation: While the overall algorithm (likely) chooses is actions fairly intelligently, for the repair part it is ok to proceed in a fairly stupid way. This is so because all the repair part needs to do, is to push its own failure probability (conditioned on the fact that a repair was necessary and on the previous choices of the algorithm) to $o(1/r)$. For example, for constant $r$, it suffices that the repair part fails with probability tending to zero arbitrarily slowly. In our paper, the we bound the failure probability of the "smart" algorithm by $O((\mbox{polylog}\,n)/n)$, and the "stupid" repair part fails with conditional probability $O((\mbox{polylog}\,n)/n)$, too.
Branching system
As is often the case with SAT, the analysis of our algorithm entails studying properties of a branching system. Of interest here is the total population size, and we need the expectation and a tail probability estimate. It is actually closer to the problem to speak of a ``discrete time queue'', but at the heart of it is (as is the case with queues) a branching system. I have written two posts (How to analyze a discrete time queue and Tail probability estimate for the busy period) on discrete-time queues in preparation for the completion of our paper. So there's not much to add here, except the one thing that needs to be dealt with almost always in the "real world" -- random variables are almost never completely independent, and also almost never identically distributed. There's generally a small amount of dependence and deviation from the ``right'' distribution, which kills the textbook- (or Wikipedia-)level theorems one would like to apply, and results in several pages of estimates, inequalities, coupling, and conditioning on the parameter regions in which it all works, leading to more estimates and inequalities. Personally, I find that these complications really are a pain: they are usually not complicated, but I never know in how much detail to write them down. The same problem also occurs at a point where we (would usually) use Wald's equation, but the summands are neither (completely) independent nor (completely) identically distributed. Fortunately, though, at this point we can revert to Optional Stopping.
Wormald's ODE-method
A black-box theorem by Wormald allows to control parameters of a random process by writing down conditional expectations of their changes from one iteration to the next in the form of an initial value problem. While I am generally not a big fan of black-box theorems, I quite fancy this one. The basic idea behind the whole method is the simple trick "if you know how the conditional expectations behave, try to find a martingale and use the Azuma-Hoeffding inequality", but the way it is used to prove the ODE-theorem is quite nice.
A little bit of ODE calculus
Even though this is tremendously simple from an analysis point of view, we do some computations with the system of ODEs which we obtain, and transform it into a single initial value problem by a change of parameters. The resulting IVP is still not solvable analytically (I guess), but it is a lot easier to give estimates on how the solution behaves, and also to solve it numerically, which we do in order to determine quantitative properties of the input data for which our algorithm succeeds. More specifically, the initial ratio of the number of clauses over the number of variables must be such that the solution to the IVP does not intersect a certain line before it falls below a certain other function. The line describes the point at which the queue length (and thus the length of the busy period) of the discrete-time queue crosses from finite to infinite.