Subject: Re: [pqc-forum] Question on Classic McEliece definition of FixedWeight From: "D. J. Bernstein" Date: Fri, 24 Feb 2023 08:51:06 +0100 To: pqc-forum Message-ID: <20230224075106.560655.qmail@cr.yp.to> The general context here is the Classic McEliece specification defining exactly how random input is used, including exactly how much randomness is used. https://classic.mceliece.org/mceliece-rationale-20221023.pdf includes the security rationale for this in Section 4. In particular, pure rejection-sampling loops such as "flip a coin until it's heads" can consume an arbitrary amount of randomness, although the chance of this happening drops exponentially with the amount. For careful testing and verification, it's important to note that the possible coin sequences used are heads, tails-heads, tails-tails-heads, etc., and not, e.g., heads-tails with the tails being ignored. Wrenna Robson writes: > However, what then is the domain of the FixedWeight mathematical > function? The specification "defines each mathematical function F by presenting an algorithm to compute F" (page 3). For randomized functions such as FixedWeight, the set of strings of random bits allowed as input "is defined as the set of strings of random bits consumed by these algorithms" (page 3). In particular, the specification presents a FixedWeight algorithm (page 10). The FixedWeight mathematical function is the function computed by this algorithm. The domain of the function is the set of strings of random bits consumed by this algorithm. The output of the mathematical function, for any particular string in the domain, is the output of this algorithm for that string of random bits. FixedWeight begins by generating sigma_1 t random bits (Step 1). It then performs a specific computation (Steps 2, 3, 4), which may lead the algorithm to restart; otherwise the algorithm produces an output (Steps 5 and 6). Consequently, the domain consists of * all accepted strings R_0 of sigma_1 t bits, where "accepted" means that the algorithm doesn't restart; * all strings R_0+R_1 (with + for concatenation) where R_0 is a rejected string of sigma_1 t bits ("rejected" means non-accepted) and R_1 is an accepted string of sigma_1 t bits; * all strings R_0+R_1+R_2 where R_0 and R_1 are rejected and R_2 is accepted; etc. Note that, in general, algorithms do not necessarily halt. If you're formalizing the definition of the output of an algorithm, you have to include an extra output possibility meaning non-halting. See, e.g., the definition of "M(x) = ⬈" on page 20 of Papadimitriou's 1993 Computational Complexity textbook. In particular, pure rejection-sampling algorithms such as FixedWeight can consume an infinite string R_0+R_1+... rather than halting---which doesn't matter in reality for any well-designed algorithm since it happens with probability 0, but does matter for rigorous proofs. The domain includes all R_0+R_1+... such that each R_i is rejected. Something else to watch out for in formalization is that elementary definitions of randomness for finite sets can't formalize the infinite RNG-output string as a random object, can't formalize statements such as "the average number of coin flips until heads is 2", and can't formalize statements about average run times of typical algorithms. The necessary definitions for rigorous analysis of probabilistic algorithms come from measure theory. See https://gilith.com/papers/termination.pdf for an example of formally verifying the behavior of probabilistic algorithms. > As written, steps 3 and 4 can cause > a restart of the algorithm back to 1, which then requires taking more > random bits. So in theory FixedWeight could need an arbitrary number > of random bits before it successfully outputs, Right. Pure rejection-sampling algorithms can consume arbitrarily long finite strings, or infinite strings in the non-halting case. > and it isn't deterministic. Correct. That isn't tied to the number of random bits used or to the internal rejection-sampling structure: encapsulation algorithms (and keygen algorithms) always have to be randomized for security. It's normal for larger systems to make their own decisions of RNGs as separate modules from cryptosystem specifications. Security review is partitioned the same way; see Section 4 of the rationale. > This is in contrast to the approach with KeyGen, where KeyGen is > described as a no-input function which generates a random string l, > and then SeededKeyGen uses the pseudorandom bit generator G in order > to generate the required bits: if there is a restart in SeededKeyGen, > the output of G is used to produce a new seed. So once you decide the > initial seed, it is deterministic. Right. You can think of key generation as being constructed in three layers: first there's a pure rejection-sampling algorithm; then this is composed with a stream cipher generating randomness for that algorithm; then this is composed with the external RNG to generate the cipher key. The reason for this complication is explained in Section 4.2 of the rationale. In short, the details support a spectrum of interoperable compression options for the private key. For formalization, one disadvantage of merging symmetric cryptography into rejection-sampling loops is that suddenly it's no longer clear that the composed algorithm has probability 0 of running forever. However, suitable PRF assumptions imply that the probability is extremely close to 0. > Is there an intent that FixedWeight be given a definition which makes > it deterministic in an analogous way? Any test framework that provides a deterministic randombytes() for test vectors (e.g., SUPERCOP) is already doing this in a modular way, without complicating individual cryptosystem specifications. McTiny needed to compress the encapsulation state to meet extreme performance requirements, and simply took the modular approach to compressing this state. McEliece-specific compression options provide interesting performance tradeoffs for private keys but not for the encapsulation state. See Section 4.5 of the rationale: "Applications might plug in an RNG that generates output from a series of seeds as in key generation, allowing a ciphertext to be compressed to the final seed. See, e.g., [8]. The same compression approach works for rejection sampling in much more generality." ---D. J. Bernstein (on behalf of the Classic McEliece team)