I've given the below a more formal presentation in a note (PDF). If you're interested in this open problem, you can find more details and omitted proofs there.
If you believe that tools from additive combinatorics are highly relevant to research in algorithms (some people do), it's natural to wonder about applications that run in the opposite direction: what can algorithms research tell us about additive combinatorics?
This blog post considers one such question that I think is natural. We start with the notion of a sumset: given an integer set \(A\), its sumset is the set
$$ A+A := \{a + b \; | \; a, b \in A \}$$
containing every sum of two integers chosen from \(A\). The ratio of the size of the sumset \(|A+A|\) to the original set \(|A|\) is an important indicator of "additive structure": up to constants, the statements
- \(|A + A| \leq C_1 |A|\),
- \(A + A + ... + A,\) with \(s\) summands, has cardinality at most \( (C_2)^s |A| \), and
- \(A\) can be contained in a generalized arithmetic progression of dimension \(C_3\) and volume \(C_4 |A|\),
among others, are all equivalent. Additive combinatorics is, more or less, the study of sumsets.
So: given a set \(S\), is it a sumset? (That is, can \(S\) be expressed as \(A+A\) for some \(A\)?) Designing algorithms to answer this question tests our understanding of integer behavior under addition and might, I hope, lead to further discoveries.
Problem Statement:
Sumset Containment (SSC):
Input: An (integer) set \(S\) of cardinality \(n\) and a parameter \(k \leq n\).
Output: A solution \(A\) satisfying \(A + A \subseteq S\) and \(|A + A| \geq k\).
This formulation includes the extra parameter \(k\), but if we set \(k = n\) we recover our original question, which I'll call Sumset Equivalence (SSE). (We could also parameterize in the size of the set \(|A|\); this turns out to be exactly equivalent to \(k\)-Clique on a certain graph, as we'll see below.)
Exact Algorithms:
The only approaches to this problem that I'm aware of begin by constructing a candidate set \(C\) satisfying the property that \(C \supseteq A\) for every possible solution \(A\). Once an algorithm has constructed \(C\), it can search for solutions within it.
Approach 1. For example, if there exists a set \(A\) such that \(A + A \subseteq S\), we know that
$$ A \subset C := S/2 = \{ s / 2 \; | \; s \in S \}. $$
In this case, \(C\) contains \(n\) elements, so we can do a brute force search for solutions in time \(O^*(2^n)\). (The \(O^*\) notation hides polynomial factors.)
Approach 2. Even better, we can construct the candidate graph
$$G_C := (C, \{(c_i, c_j) \; | \; c_i + c_j \in S\})$$
and observe that any set \(A \subseteq C\) satisfies \(A + A \subseteq S\) if and only if \(A\) induces a clique in \(G_C\) (and \(G_C\) includes every self-loop \((c, c)\) on a clique vertex). Moreover, the clique corresponds to a solution if and only if it covers edges \((c_i, c_j)\) associated with at least \(k\) different sums \(c_i + c_j\).
Because every solution clique is a subgraph of some maximal solution clique, to solve the problem, it suffices to check every maximal clique in the candidate graph. By a classic result, a graph on \(m\) vertices contains at most \(3^{m/3}\) maximal cliques; moreover, these can be efficiently enumerated.
This gives us a better algorithm: first, build the candidate set \(C\) and the candidate graph \(G_C\), then check every maximal clique in \(G_C\) to see if it corresponds to a solution. This takes time \(O^*(3^{|C|/3}) = O^*(3^{n/3})\) if we use the \(n\)-element candidate set defined above.
Approach 3. A further improvement in runtime comes from building a better candidate set. Suppose our algorithm is given a slightly modified "promise problem" version of SSC: in addition to the set \(S\), the algorithm is guaranteed that some solution \(A\) exists and is given \(a_1\) and \(a_m\), the smallest and largest elements of \(A\), respectively.
We claim that \(A\) is a subset of the candidate set
$$ C' := (S - a_1) \cap (S - a_m) \cap S/2 \cap [a_1 : a_m] $$
and furthermore that \(|C'| \leq \lceil n/2 \rceil\) (proof in the attached PDF). Solving the SSC by enumerating the maximal cliques of the candidate graph \(G_{C'}\) takes time \(O^*(3^{|C'|/3}) = O^*(3^{n/6})\).
Finally, we observe that the promise of \(a_1\) and \(a_m\) to the algorithm is without loss of generality: since \(a_1\) and \(a_m\) (defined with respect to a certain solution \(A\)) are elements of the set \(S/2\), we can run our algorithm \(O(n^2)\) times, one for every possible pair \((a_1, a_m)\). Because our algorithm can easily check for false positives, the iterations that make incorrect guesses fail harmlessly.
Open Problems:
SSC is both NP-complete and W[1]-complete (parameterized in \(k\)), so efficient exact or parameterized algorithms appear unlikely. However, improvements in exact algorithms could still be interesting: in particular, improving on the \(O^*(3^{n/6})\)-time algorithm above seems like it might require a more sophisticated use of the structure of sumsets on integers. Open problems include:
Open Problem 1: Does there exist an \(O^*(3^{(1/6 - \epsilon)n})\)-time algorithm for SSC or SSE, for some constant \(\epsilon > 0\)?
Open Problem 2: How hard are SSC/SSE on other groups besides the integers? (For example, on the Boolean cube \(\mathbb{F}_2^n\), the problem remains NP-complete and can be solved in time \(O^*(2^n)\), but easy improvements might be possible here.)
Open Problem 3: Design algorithms that determine whether \(A\) contains more complicated sum and difference sets. For example, does \(S\) contain a large sumset \(A + B\) for some \(A\), \(B\) (ignoring trivial solutions like \(S = S + \{0\}\))? What about sets of the form \(rA + sB\) for integers \(r\) and \(s\)?