Računska zahtevnost (Cabello)
Warning: The notes are not optimized for HTML (yet). Without warranty.
Lecture 1: Motivation
For a matrix \(A \in \R^{n \times n}\) we usually define a determinant as \[ \det(A) = \sum\limits_{\sigma} \sgn(\sigma) . \prod\limits_{i \in [n]} a_{i, \sigma(i)}. \]
Looking at it as a computational problem that inputs the matrix \(A\) and outputs the determinant \(\det(A)\). How quickly can we compute it?
- We can compute it in \(O(n!)\) time by directly applying the definition. Using the stirling approximation, this is something like \(O(\sqrt{2\pi n} (\frac{n}{e})^n)\) time, which is very inefficient.
- Using Gaussian elimination \(A = LUP\), we can compute the determinant as \(\det(A) = \det(L) \cdot \det(U) \cdot \det(P)\). We have \(O(n^3)\).
- We can compute it in \(O(n^{\omega})\) time by using fast matrix multiplication algorithms, where \(\omega < 2.375\).
This gives us a general view of what happens.
- The reduction from step 1 to 2 is significant, we go from superexponential to a polynomial algorithm.
- When we get a polynomial time algorithm, we try to reduce it as much as possible.
- We cannot get linear time, since the input is itself quadratic.
- For \(O(n^2)\), we do not know.
- But we have a state of the art solution in \(O(n^{2.375})\).
There's a problem with the example though. The \(O(n^3)\) talks about sums and products, which are nonconstant as time goes on. Choosing the pivot carefully, we can get polynomial time, but it can be much greater.
We can fix this by taking a finite field, say \(\Z_7\). There's fundamentally more informations as \(n\) grows. Picking a finite field is a common "hack".
We should be careful with this. Nobody is careful with this.
\[ \operatorname{perm}(A) = \sum\limits_{\sigma} \prod\limits_{i \in [n]} a_{i, \sigma(i)} \]
This is an analogous problem which can again be computed by definition which is again computable in \(O(n!)\). But here we do not know of a polynomial time algorithm.
- For instance, the permanent of a \(\Z_2\) matrix is the number of perfect matchings in a bipartite graph.
- Note that we can compute a single perfect matching in a bipartite graph (optimisation methods/linear programming).
- We do not know how to count such matchings in polynomial time.
- Can we generate a perfect matching uniformly at random in the set of perfect matchings in a bipartite graph? We again do not know how to do it in polynomial time.
The naive algorithm checks divisors up to \(n\) (or up to \(\sqrt{n}\)). Assuming that checking \(i \mid n\) takes constant time, this would give us \(\approx\sqrt{n}\) steps.
Here's a trap. It's polynomial in \(n\), but not polynomial in \(\log_2(n)\). The more correct size of the input is the number of digits. Then \(\sqrt{n} \approx 2^{\frac{1}{2}\log_2n}\), which is exponential in the input size.
Since the 70s, it is known that we have a \(\log_2n\) polynomial time algorithm that tests primality with high probability, where there are no false positives, but might be false negatives, i.e. it might say a composite number is prime with probability \(\leq \frac{1}{4}\). This is in randomized polynomial time. To improve the odds, we can rerun the algorithm repeatedly.
In 2002, Aggraval, Kayal, Sexena showed there exists a deterministic polynomial time algorithm which recognises prime numbers.
For a natural number, return a number that divides it or mark it prime.
We do not know how to factorize in polynomial time (deterministic or rational).
- If problem \(A\) can be solved in polynomial time, then problem \(B\) can be solved in polynomial time. (Reduction)
- If we have sets of problems with some given properties \(X, Y\), we're looking at \(X \subseteq Y\) or \(X = Y\).
- Problem \(X\) needs at least time/space \(\approx n^2\), say. (Lower bound)
The lower bound problems are very very few, and usually very weak.
Lecture 2: Basic definitions
If we have integers \(a, b\) of length \(n\), we learned in primary school to add them in \(O(n)\) and multiplying them in \(O(n^2)\). In fact, we can get multiplication down to \(O(n\log n)\) using a galactic algorithm. Even so, using FFT, we have more practical algorithms of \(O(n \log^2 n)\).
Still, we do not know if this \(\log\) factor is required and multiplication really is more computationally demanding.
An encoding is a function \[ e : \text{ inputs} \rightarrow \left\{ 0, 1 \right\}^{ * }. \]
(We could use some other alphabet, but we use binary encodings on the computer.)
We assume a fixed explicit encodings in this course. This means that we always have an explicit input, e.g. we could give a number recursively in terms of its digits, but we still always assume it to be given explicitly.
A computational problem with boolean output.
The fiber of true becomes interesting as an object of study. When we view it in an encoding, it is \[ L = \left\{ w \in \left\{ 0, 1 \right\}^{ * } \mid P(w) \right\} , \] which is a language. So decision problems are recognising a language. (For now?) we only care about decidable languages.
An additional detail to sweep under the rug is that we assume \(P(w)\) also checks the validity of \(w\), i.e. we never think about encoding problems as such.
\[ \begin{aligned} f &= O(g(n)) \iff \exists C \in \R .\exists n_0 \in \N . \forall n \geq n_0 . f(n) \leq C \cdot g(n) \\ f &= \Omega(g) \iff g = O(f) \\ f &= \Theta(g) \iff f = O(g) \text{ in } g = O(f) \end{aligned} \] We also use the little \(o\) notation, which indicates a weak upper bound: \[ f = o(g) \iff \exists n_0 \in \N . \forall C > 0 . \forall n \geq n_0 . f(n) \leq C g(n) \] i.e. all since we have enough additional complexity, it dominates any constant. We get \[ f = o(g) \implies \lim_{n \rightarrow \infty} \frac{f}{g} = 0 \]
Likewise we define \[ f = \omega(g) \iff g = o(f). \]
For computational complexity, we want an explicit notion of time and space, which is better done in Turing machines than, say, \(\lambda\)-calculus. The definiton is very robust - we have several equivalent variants we can deploy when need be.
We define it as \((\Gamma, \sqcup, Q, q_s, q_h, \delta)\) where \(\delta : \Gamma \times Q \rightarrow \Gamma \times Q \times \left\{ -1, 0, 1 \right\}\) is the transition function.
The definition of the TM doesn't itself speak of the tape state. We also define a configuration of the TM \(M\) as \((q, t, i)\) where \(q\) is the state of the control unit, \(t : \Z \rightarrow \Gamma\) the state of the tape and \(i \in \Z\) the position of the head. The initial configuration is then \((q_s, in, 0)\). The input is always right from 0.
A TM \(M\) recognizes a language \(L\) if
- For every input, \(M\) halts
- \(\forall \omega \in L .\,\) applying \(M\) to \(\omega\) finishes with \(q_{\text{accept}}\)
- \(\forall \omega \not\in L .\,\) applying \(M\) to \(\omega\) finishes with \(q_{\text{reject}}\)
There exist undecidable languages.
An example is the halting problem.
- Input: a Turing machine \(M\) and \(\omega\).
- Output: Does the Turing machine \(M\) with input \(\omega\) halt?
Another example is Hilbert's 10th problem:
- Input: A polynomial with multiple variables and coefficients in \(\Z\).
- Output: Does the polynomial have a zero in \(\Z\)?
Let \(M\) be a TM. We define
\begin{align*} \operatorname{time} : {\N} &\rightarrow \N \cup \left\{ \infty \right\} \\ n &\mapsto \text{ maximum time that \(M\) runs over all inputs of length \(n\)} \end{align*}Note: A Turing machine has time complexity \(O(n)\) or \(\Omega(n \log n)\). We cannot have something like \(\Theta(n \sqrt{\log n})\).
It computes a function \[ f : \Sigma_1^{ * } \rightarrow \Sigma_2^{ * } \] if
- \(M\) always finishes for \(\omega \in \Sigma_1^{ * }\)
- \(M\) has a halting state \(q_{h}\)
- At the end the tape contains \(f(\omega)\) and all blanks around.
In particular this means we need \(\Gamma \supseteq \Sigma_1 \cup \Sigma_2 \) .
This is more general than decideability, which reduces to the case \(f : \Sigma^{ * } \rightarrow \left\{ 0, 1 \right\}\).
\(M\) computes a function \(f : \left\{ 0, 1 \right\}^{ * }\) in time \(\operatorname{time}_M(n)\) using \(\abs{\Gamma} = k\) symbols. Then there exists another TM \(\widetilde{M}\) that computes \(f\) which uses symbols \(\widetilde{\Gamma} = \left\{ 0, 1, \sqcup \right\}\) and has time complexity \[ \operatorname{time}_{\widetilde{M}}(n) = O(\operatorname{time}_M(n) + n^2). \] Note that the \(O\) constant depends on the original TM.
The idea is that we define and use an injection encoding all original symbols (including blank) \[ \Gamma \hookrightarrow \left\{ 0, 1, \sqcup \right\}^{k} \] where we encode each symbol as a \(k\)-chunk. \(\widetilde{Q}\) records partial readings for blocks. One step of \(M\) becomes \(O(k)\) steps of \(\widetilde{M}\).
The \(O(n^2)\) term comes only from copying the data.
Lecture 3: Deterministic Turing Machines
A \(k\)-tape Turing machine has \(k\) tapes. It's equivalent to consider transition functions which move just one or all tape-heads at once. Sometimes one of the tapes is considered to be read-only and considered as an input tape, likewise sometimes we only have an output tape, etc.
If there exists a TM with \(k\) tapes that recognizes \(L\) in time \(T(n)\), then there exists a 1-tape TM that recognizes \(L\) in time \[ O(T(n)^2). \]
In particular, if \(k\)-tape \(M\) runs in polynomial time, so does the 1 tape version.
The idea is that we intercalate the tapes. Use states to remember from which tape of \(M\) we are reading in \(\widetilde{M}\).
The new set of symbols for \(\widetilde{M}\) is \[ \widetilde{\Gamma} = \Gamma \cup \left\{ \widehat{a} \mid a \in \Gamma \right\} \] where the symbol \(\widehat{a}\) in the tape indicates the head of the given tape is there.
We get \(\widetilde{Q} = Q \times \widehat{\Gamma}^k \times [k] \cup O(1) \text{ states}\)
This is a data-oblivious construction, we access the data following the same pattern, independent of the actual data. The quadratic term comes from the fact we need to read the whole tape on each step.
Same idea as the bijection \(\N \cong \Z\). Or store pairs and use \(\Gamma^2\).
Whenever there exists a TM \(M\) of some type that recognizes a language \(L\), there exists another \(TM\) \(\widetilde{M}\) of some other type that recognizes \(L\). Moreover, \[ \operatorname{time}_{\widetilde{M}}(n) = O\left((\operatorname{time}_M(n))^{\text{constant}}\right) \]
We say that \(\operatorname{DTIME}(f(n))\) is the set of languages \(L\) such that there exists a TM \(M\) that recognizes \(L\) and has \[ \operatorname{time}_M(n) = O(f(n)). \]
We usually take the \(TM\) to be a \(k\)-tape machine for some constant \(k\).
This is a non-robust definition.
We define \[ \operatorname{P} = \bigcup\limits_{d \geq 1} \operatorname{DTIME}(n^d). \]
A problem \(\in \operatorname{P}\) is
- A decision problem
- that can be recognized in polynomial time
Note that \(\operatorname{P}\) is a natural definition.
- Constants are not relevant. For instance, we can get constant speed-up using more symbols.
- It's robust wrt the type of the TM.
- Computers, contra TMs, have (finite) RAM.
- This ends up making no difference, we can simulate indexed access using a TM (with quadratic blowup), using counters.
- \(\operatorname{P}\) is closed wrt to taking calls to other programs in \(\operatorname{P}\) that do not grow the output. An issue would be a program which doubles the input.
- Currently, all of the physically realizable notions of computation give the same definition of polynomial time.
We define \[ \operatorname{EXP} = \bigcup\limits_{d \geq 1} \operatorname{DTIME} \left( 2^{n^{d}} \right) \]
\[ \operatorname{P} \subseteq \operatorname{QuasiP} \subset \operatorname{EXP}. \] In general these are strict inclusions, but we do not know that yet at this point in the course.
An Universal Turing machine is a TM \(\mathcal{U}\) that takes as input \(\left< M, x \right>\), where \(M\) is (a description of) a Turing machine, \(x\) is an input word to \(M\), and returns \(M(x)\), i.e. what the TM \(M\) would give for input \(x\).
Furthermore, \(\mathcal{U}\) simulates the computation of \(M(x)\).
Notes:
- The universal TM corresponds to an interpreter.
- \(\mathcal{U}\) can have a fixed number of symbols and states but can simulate arbitrary TMs.
- Universal TMs exist and can simulate \(M(x)\) efficiently. We can do it with quadratic blowup with 1 tape. With multiple tapes, we can get the blowup down to \(n \log n\).
Lecture 4: Nondeterministic computing
A nondeterministic Turing machine with 1 tape has transition function \[ \delta : Q \times \Gamma \rightarrow \pot(Q\times \Gamma \times \left\{ 0, -1, 1 \right\}) \] which maps a state to a set of states, corresponding to the options of the next state. Note that we have no explicit choice over the branching.
A Nondeterministic Turing machine accepts a language \(L\) if
- It always finishes its computation
- If \(\omega \in L\), there must exist a computation path that finishes with \(q_{\text{accept}}\).
- If \(\omega \not\in L\), all possible branches must finish with \(q_{\text{reject}}\).
Let \(M\) be a NTM. Then
\begin{align*} \operatorname{time}_M(n) = &\text{ max of the number of steps of all inputs of size } \\ &n \text{ and over all possible choices made non-deterministically} \end{align*}Our definition takes the maximum over all inputs and choices. One can also take the definition where we take the minimum over the choices, but we must be more careful in this instance. For time complexity, both definitions are equivalent - we can always add a counter and keep track of the steps and memory.
We also have a time complexity class for nondeterministic Turing machines.
\[ \operatorname{NP} = \bigcup\limits_{d \geq 1} \operatorname{NTIME}(n^d). \] This is the decision problems for which there exists a NTM \(M\) that solves the problem in \(O(n^d)\) for some constant \(d \leq 1\).
Note that \(\operatorname{P} \subseteq \operatorname{NP}\), since a deterministic TM is also a nondeterministic TM.
A big question of computability is whether \(\operatorname{P} = \operatorname{NP}\). It is thought to be an inequality, but we do not know how to approach the problem.
\(L \in \operatorname{NP}\) if and only if there exists a polynomial time deterministic TM \(M\) and a polynomial \(p\) such that \[ \forall x \in \left\{ 0, 1 \right\}^{ * } .\, x \in L \iff \exists y \in \left\{ 0, 1 \right\}^{p(\abs{x})} .\, M(x, y) = 1, \] i.e. \(M\) accepts \((x, y)\). We call \(y\) a certificate for \(x\).
Note that if the answer is no, there exists no certificate. This means any possible certificate we take will reject.
The idea is that we can take \(y\) to be the encoding of the choices made by the machine.
\((\implies)\): Let \(L \in \operatorname{NP}\). There exists a NTM \(M\) that runs in polynomial time \(\operatorname{time}_M(n) = q(n)\). Then \[ x \in L \iff \text{ a computation path accepts } x \] We design a deterministic TM \(\widetilde{M}\) that has one extension tape \(T\). In the new tape, we write an encoding of the choices made by \(M\) to accept \(x\). We take what is written as \(y\). For \(\widetilde{M}\) we make a deterministic transition function that uses the tape \(T\) as additional input which \(\widetilde{M}\) consults to collapse the choice of next state.
When \(M\) rejects \(x\), all computation paths lead to rejection, so in particular every \(y\) written in \(T\) leads to rejection in \(\widetilde{M}\).
- \(\impliedby\): Say there exists a TM \(M\) with the desired properties and polynomial time complexity. Thus \[ x \in L \iff \text{ a computation path accepts } x. \] We define the nondeterministic TM to write \(y\). This means we have \(2^{p(n)}\) options for computation paths. We use a counter to keep track of how much of the certificate we have written. NB that we took the certificate to always be of the length \(p(\abs{x})\). Then after writing out \(y\), we run our \(M(x,y)\). In the branch where \(M(x, y) = 1\), \(\widetilde{M}\) accepts \(x\).
An example of a problem in \(\operatorname{NP}\) is computing the independent set. We take a graph \(G\) and \(k \in \N\) as an input and we're computing a set \(U \subseteq V(G)\) such that \(\abs{U} \geq k\) such that \[ \not\exists u, v \in E(G) . \, \left\{ u, v \right\} \subseteq U \]
A certificate in this case is a \(U \subseteq V(G)\) that satisfies the condition.