|
barretenberg
|
KECCAAAAAAAAAAK. More...
#include <keccak.hpp>
Classes | |
| struct | keccak_state |
Public Types | |
| using | witness_ct = stdlib::witness_t< Builder > |
| using | field_ct = stdlib::field_t< Builder > |
| using | bool_ct = stdlib::bool_t< Builder > |
| using | byte_array_ct = stdlib::byte_array< Builder > |
| using | uint32_ct = stdlib::uint32< Builder > |
Static Public Member Functions | |
| static constexpr uint256_t | convert_to_sparse (uint256_t input) |
| Convert a binary integer into a base11 integer. | |
| static constexpr uint256_t | normalize_sparse (uint256_t input) |
| Normalize a base-11 integer where each base value can be > 1. | |
| static constexpr std::array< uint256_t, NUM_KECCAK_ROUNDS > | get_sparse_round_constants () |
| Get the sparse round constants object. | |
| static constexpr uint256_t | get_chi_offset () |
Compute the constant offset added in the Chi round. | |
| template<size_t lane_index> | |
| static field_t< Builder > | normalize_and_rotate (const field_ct &limb, field_ct &msb) |
Normalize a base-11 limb and left-rotate by keccak::ROTATIONS[lane_index] bits. This method also extracts the most significant bit of the normalised rotated limb. Used in the RHO and IOTA rounds and in sponge_absorb. | |
| static void | compute_twisted_state (keccak_state &internal) |
| Compute twisted representation of hash lane. | |
| static void | theta (keccak_state &state) |
| THETA round. | |
| static void | rho (keccak_state &state) |
| RHO round. | |
| static void | pi (keccak_state &state) |
| PI. | |
| static void | chi (keccak_state &state) |
| CHI. | |
| static void | iota (keccak_state &state, size_t round) |
| IOTA. | |
| static void | sponge_absorb (keccak_state &internal, const std::vector< field_ct > &input_buffer, const std::vector< field_ct > &msb_buffer, const field_ct &num_blocks_with_data) |
| static byte_array_ct | sponge_squeeze (keccak_state &internal) |
| static void | keccakf1600 (keccak_state &state) |
| static byte_array_ct | hash (byte_array_ct &input, const uint32_ct &num_bytes) |
| static byte_array_ct | hash (byte_array_ct &input) |
| static std::vector< field_ct > | format_input_lanes (byte_array_ct &input, const uint32_ct &num_bytes) |
| Convert the input buffer into 8-bit keccak lanes in little-endian form. Additionally, insert padding bytes if required, and add the keccak terminating bytes 0x1/0x80 (0x1 inserted after the final byte of input data) (0x80 inserted at the end of the final block) | |
| static std::vector< uint8_t > | hash_native (const std::vector< uint8_t > &data) |
Static Public Attributes | |
| static constexpr uint256_t | BASE = 11 |
| static constexpr size_t | BITS = 256 |
| static constexpr size_t | WORD_SIZE = 8 |
| static constexpr size_t | BLOCK_SIZE = (1600 - BITS * 2) / WORD_SIZE |
| static constexpr size_t | LIMBS_PER_BLOCK = BLOCK_SIZE / 8 |
| static constexpr size_t | NUM_KECCAK_ROUNDS = 24 |
| static constexpr size_t | NUM_KECCAK_LANES = 25 |
| static constexpr std::array< uint64_t, NUM_KECCAK_ROUNDS > | RC |
| static constexpr std::array< size_t, NUM_KECCAK_LANES > | ROTATIONS |
| static constexpr std::array< uint256_t, NUM_KECCAK_ROUNDS > | SPARSE_RC = get_sparse_round_constants() |
| static constexpr uint256_t | CHI_OFFSET = get_chi_offset() |
KECCAAAAAAAAAAK.
Creates constraints that evaluate the Keccak256 hash algorithm.
UltraPlonk only due to heavy lookup table use.
Current cost 17,329 constraints for a 1-block hash using small(ish) lookup tables (total size < 2^64)
| Builder |
|
static |
CHI.
The CHI round applies the following logic to the hash lanes: A XOR (~B AND C)
In base-11 representation we can create an equivalent linear operation: 1 + 2A - B + C
Output values will range from [0, 1, 2, 3, 4] and are mapped back into [0, 1] via the KECCAK_CHI_OUTPUT lookup table
N.B. the KECCAK_CHI_OUTPUT table also has a column for the most significant bit of each lookup. We use this to create a 'twisted representation of each hash lane (see THETA comments for more details)
| Builder |
|
static |
Compute twisted representation of hash lane.
The THETA round requires computation of XOR(A, ROTL(B, 1))
We do this via a 'twisted' base-11 representation.
If the bit slices for a regular variable are arranged [b63, ..., b0], the twisted representation is a 65-bit variable [b63, ..., b0, b63]
The equivalent of XOR(A, ROTL(B, 1)) is A.twist + 2B.twist (in base-11 form) The output is present in bit slices 1-64
| Builder |
| internal |
|
inlinestaticconstexpr |
Convert a binary integer into a base11 integer.
Input = \sum_{i=0}^63 b_i * 2^i Output = \sum_{i=0}^63 b_i * 11^i
| input |
|
static |
Convert the input buffer into 8-bit keccak lanes in little-endian form. Additionally, insert padding bytes if required, and add the keccak terminating bytes 0x1/0x80 (0x1 inserted after the final byte of input data) (0x80 inserted at the end of the final block)
| Builder |
| input | |
| num_bytes |
|
inlinestaticconstexpr |
Compute the constant offset added in the Chi round.
We want to compute, for each bit slice of the inputs A, B, C... 1 + 2A - B + C
i.e. we need to add the constant value \sum_{i=0}^63 11^i
|
inlinestaticconstexpr |
Get the sparse round constants object.
|
static |
IOTA.
XOR first hash limb with a precomputed constant. We re-use the RHO_OUTPUT table to normalize after this operation
| Builder |
| internal | |
| round |
|
static |
Normalize a base-11 limb and left-rotate by keccak::ROTATIONS[lane_index] bits. This method also extracts the most significant bit of the normalised rotated limb. Used in the RHO and IOTA rounds and in sponge_absorb.
Normalize process: Input v = \sum_{i=0}^63 b_i * 11^i , where b is in range [0, 1, 2] Output = \sum_{i=0}^63 (b_i & 1) * 11^i (i.e. even values go to 0)
Implementation is via a sequence of lookup tables
| lane_index | What keccak lane are we working on? |
| limb | Input limb we want to normalize and rotate |
| msb | (return parameter) The most significant bit of the normalized and rotated limb |
manually construct the ReadData object required to generate plookup gate constraints. To explain in more detail: the input integer can be represented via two the bit slices [A, B] (A = left, B = right)
For example, imagine our input is a 32-bit integer A represented as: A = A3.11^24 + A2.11^16 + A1.11^8 + A0, and our output is a 32-bit integer B = B3.11^24 + B2.11^16 + B1.11^8 + B0
In this example, we want to normalize A and left-rotate by 16 bits.
Our lookup gate wire values will look like the following:
| Row | C0 | C1 | C2 |
|---|---|---|---|
| 0 | A3.11^24 + A2.11^16 + A1.11^8 + A0 | B1.11^8 + B0 | A0.msb() |
| 1 | A3.11^16 + A2.11^8 + A1 | B1 | A1.msb() |
| 2 | A1311^8 + A2 | B3.11^8 + B2 | A2.msb() |
| 3 | A3 | B3 | A3.msb() |
The plookup table keys + values are derived via the expression:
C1[i] + C1[i+1].q1[i] = LOOKUP[C0[i] + C0[i+1].q0[i]]
(the same applies for C2, however q2[i] = 0 for all rows)
The plookup coefficients for the rows treat Column0 as a single accumulating sum, but Column1 is a pair of accumulating sums. In the above example, the q coefficient value are:
| Row | Q1 | Q2 | Q3 |
|---|---|---|---|
| 0 | 11^8 | 11^8 | 0 |
| 1 | 11^8 | 0 | 0 |
| 2 | 11^8 | 11^8 | 0 |
| 3 | 0 | 0 | 0 |
stdlib::plookup cannot derive witnesses in the above pattern without a substantial rewrite, so we do it manually in this method!
|
inlinestaticconstexpr |
Normalize a base-11 integer where each base value can be > 1.
Input = \sum_{i=0}^63 b_i * 11^i Output = \sum_{i=0}^63 (b_i & 1) * 11^i
(XORs are evaluated by adding integers in sparse-form and normalizing. Even = 0, Odd = 1)
| input |
|
static |
PI.
PI permutes the keccak lanes. Adds 0 constraints as this is simply a re-ordering of witnesses
| Builder |
| internal |
|
static |
RHO round.
| Builder |
The limbs of internal.state are represented via base-11 integers limb = \sum_{i=0}^63 b_i * 11^i The value of each b_i can be in the range [0, 1, 2] due to the THETA round XOR operations
We need to do the following:
rotations matrixThe KECCAK_RHO_OUTPUT lookup table is used for both. See normalize_and_rotate for more details
COST PER LIMB... 8 gates for first lane (no rotation. Lookup table is 8-bits per slice = 8 lookups for 64 bits) 10 gates for other 24 lanes (lookup sequence is split into 6 8-bit slices and 2 slices that sum to 8 bits, an addition gate is required to complete the rotation)
Total costs is 248 gates.
N.B. Can reduce lookup costs by using larger lookup tables. Current algo is optimized for lookup tables where sum of all table sizes is < 2^64
|
static |
THETA round.
| Builder |
THETA consists of XOR operations as well as left rotations by 1 bit.
We represent 64-bit integers in a base-11 representation where limb = \sum_{i=0}^63 b_i * 11^i
At the start of THETA, all b_i values are either 0 or 1
We can efficiently evaluate XOR operations via simple additions! If b_i = even, this represents a bit value of 0 If b_i = odd, this represents a bit value of 1
The KECCAK_THETA_OUTPUT lookup table is used to 'normalize' base-11 integers, i.e. convert b_i values from [0, ..., 10] to [0, 1] where even == 0, odd == 1
The choice of base for our representation effects the following:
Bigger base reduces (1) but increases (2). For THETA, base-11 is optimal (I think...)
We need to left-rotate the C[5] array by 1-bit to compute D[5]. Naive way is expensive so we cheat! When converting integers into base-11 representation, we use a lookup table column to give us the most significant bit of the integer.
This enables us to create a 'twisted' representation of the integer in base-11:
twisted_limb = (b_63) + \sum_{i=0}^63 b_i * 11^{i + 1}
e.g. if limb's bit ordering is [0, b63, ..., b1, b0 ] twisted limb bit ordering is [b63, b62, ..., b0, b63]
We want to be able to compute XOR(A, B.rotate_left(1)) and can do this via twisted representations
The equivalent in base-11 world is twisted_A * 2 + twisted_B. The output of the XOR operation exists in bit-slices 1, ..., 63 (which can be extracted by removing the least and most significant slices of the output) This is MUCH cheaper than the extra range constraints required for a naive left-rotation
Total cost of theta = 20.5 gates per 5 lanes + 25 = 127.5 per round
field_ct::accumulate can compute 5 addition operations in only 2 gates: Gate 0 wires [a0, a1, a2, a3] Gate 1 wires [b0, b1, b2, b3] b3 = a0 + a1 + a2 + a3 b2 = b3 + b0 + b1 (b2 is the output wire)
Compute D by exploiting twisted representation to get a cheap left-rotation by 1 bit
D contains 66 base-11 slices.
We need to remove the 2 most significant slices as they are artifacts of our twist operation.
We also need to 'normalize' D (i.e. convert each base value to be 0 or 1), to prevent our base from overflowing when we XOR D into internal.state
KECCAK_THETA_OUTPUT currently splices its input into 16 4-bit slices (in base 11 i.e. from 0 to 11^4 - 1) This ensures that sliced_D is correctly range constrained to be < 11^64
|
staticconstexpr |
|
staticconstexpr |