|
barretenberg
|
Classes | |
| struct | chain_add_accumulator |
| struct | secp256k1_wnaf |
| struct | secp256k1_wnaf_pair |
Public Member Functions | |
| element (const typename NativeGroup::affine_element &input) | |
| element (const Fq &x, const Fq &y) | |
| element (const element &other) | |
| element (element &&other) | |
| void | validate_on_curve () const |
| element & | operator= (const element &other) |
| element & | operator= (element &&other) |
| byte_array< Builder > | to_byte_array () const |
| element | operator+ (const element &other) const |
| element | operator- (const element &other) const |
| element | operator- () const |
| element | operator+= (const element &other) |
| element | operator-= (const element &other) |
| std::array< element, 2 > | add_sub (const element &other) const |
| Compute (*this) + other AND (*this) - other as a size-2 array. | |
| element | operator* (const Fr &other) const |
| element | conditional_negate (const bool_t< Builder > &predicate) const |
| element | normalize () const |
| element | reduce () const |
| element | dbl () const |
| element | montgomery_ladder (const element &other) const |
| element | montgomery_ladder (const chain_add_accumulator &accumulator) |
| element | multiple_montgomery_ladder (const std::vector< chain_add_accumulator > &to_add) const |
| Perform repeated iterations of the montgomery ladder algorithm. | |
| element | quadruple_and_add (const std::vector< element > &to_add) const |
| Compute 4.P + to_add[0] + ... + to_add[to_add.size() - 1]. | |
| NativeGroup::affine_element | get_value () const |
| Builder * | get_context () const |
| Builder * | get_context (const element &other) const |
| template<size_t max_num_bits> | |
| element< C, Fq, Fr, G > | wnaf_batch_mul (const std::vector< element > &points, const std::vector< Fr > &scalars) |
| template<typename , typename > requires (IsNotGoblinBuilder<C>) | |
| element< C, Fq, Fr, G > | bn254_endo_batch_mul_with_generator (const std::vector< element > &big_points, const std::vector< Fr > &big_scalars, const std::vector< element > &small_points, const std::vector< Fr > &small_scalars, const Fr &generator_scalar, const size_t max_num_small_bits) |
| template<typename , typename > requires (IsNotGoblinBuilder<C>) | |
| element< C, Fq, Fr, G > | bn254_endo_batch_mul (const std::vector< element > &big_points, const std::vector< Fr > &big_scalars, const std::vector< element > &small_points, const std::vector< Fr > &small_scalars, const size_t max_num_small_bits) |
| template<size_t max_num_bits, size_t WNAF_SIZE> | |
| std::vector< field_t< C > > | compute_wnaf (const Fr &scalar) |
| template<typename , typename > | |
| element< C, Fq, Fr, G > | secp256k1_ecdsa_mul (const element &pubkey, const Fr &u1, const Fr &u2) |
| template<size_t num_elements, typename > | |
| std::array< twin_rom_table< C >, 5 > | create_group_element_rom_tables (const std::array< element, num_elements > &rom_data, std::array< uint256_t, 8 > &limb_max) |
| Constructs a ROM table to look up linear combinations of group elements. | |
| template<size_t , typename > | |
| element< C, Fq, Fr, G > | read_group_element_rom_tables (const std::array< twin_rom_table< C >, 5 > &tables, const field_t< C > &index, const std::array< uint256_t, 8 > &limb_max) |
Static Public Member Functions | |
| static element | from_witness (Builder *ctx, const typename NativeGroup::affine_element &input) |
| static element | one (Builder *ctx) |
| static chain_add_accumulator | chain_add_start (const element &p1, const element &p2) |
| static chain_add_accumulator | chain_add (const element &p1, const chain_add_accumulator &accumulator) |
| static element | chain_add_end (const chain_add_accumulator &accumulator) |
| template<size_t max_num_bits = 0> | |
| static element | wnaf_batch_mul (const std::vector< element > &points, const std::vector< Fr > &scalars) |
| static element | batch_mul (const std::vector< element > &points, const std::vector< Fr > &scalars, const size_t max_num_bits=0) |
| static element | goblin_batch_mul (const std::vector< element > &points, const std::vector< Fr > &scalars, const size_t max_num_bits=0) |
| Goblin style batch multiplication. | |
|
template<typename X = NativeGroup, typename = typename std::enable_if_t<std::is_same<X, barretenberg::g1>::value>> requires (IsNotGoblinBuilder<Builder>) | |
| static element | bn254_endo_batch_mul (const std::vector< element > &big_points, const std::vector< Fr > &big_scalars, const std::vector< element > &small_points, const std::vector< Fr > &small_scalars, const size_t max_num_small_bits) |
|
template<typename X = NativeGroup, typename = typename std::enable_if_t<std::is_same<X, barretenberg::g1>::value>> requires (IsNotGoblinBuilder<Builder>) | |
| static element | bn254_endo_batch_mul_with_generator (const std::vector< element > &big_points, const std::vector< Fr > &big_scalars, const std::vector< element > &small_points, const std::vector< Fr > &small_scalars, const Fr &generator_scalar, const size_t max_num_small_bits) |
| template<typename X = NativeGroup, typename = typename std::enable_if_t<std::is_same<X, secp256k1::g1>::value>> | |
| static element | secp256k1_ecdsa_mul (const element &pubkey, const Fr &u1, const Fr &u2) |
| static std::vector< bool_t< Builder > > | compute_naf (const Fr &scalar, const size_t max_num_bits=0) |
| template<size_t max_num_bits = 0, size_t WNAF_SIZE = 4> | |
| static std::vector< field_t< Builder > > | compute_wnaf (const Fr &scalar) |
| template<size_t wnaf_size, size_t staggered_lo_offset = 0, size_t staggered_hi_offset = 0> | |
| static secp256k1_wnaf_pair | compute_secp256k1_endo_wnaf (const Fr &scalar) |
Public Attributes | |
| Fq | x |
| Fq | y |
| std::array< element< C, Fq, Fr, G >, 2 > proof_system::plonk::stdlib::element< C, Fq, Fr, G >::add_sub | ( | const element< Builder, Fq, Fr, NativeGroup > & | other | ) | const |
Compute (*this) + other AND (*this) - other as a size-2 array.
We require this operation when computing biggroup lookup tables for multi-scalar-multiplication. This combined method reduces the number of field additions, field subtractions required (as well as 1 less assert_is_not_equal check)
| C | |
| Fq | |
| Fr | |
| G |
| other |
|
static |
Generic batch multiplication that works for all elliptic curve types.
Implementation is identical to bn254_endo_batch_mul but WITHOUT the endomorphism transforms OR support for short scalars See bn254_endo_batch_mul for description of algorithm
| element< C, Fq, Fr, G > proof_system::plonk::stdlib::element< Builder, Fq, Fr, NativeGroup >::bn254_endo_batch_mul | ( | const std::vector< element< Builder, Fq, Fr, NativeGroup > > & | big_points, |
| const std::vector< Fr > & | big_scalars, | ||
| const std::vector< element< Builder, Fq, Fr, NativeGroup > > & | small_points, | ||
| const std::vector< Fr > & | small_scalars, | ||
| const size_t | max_num_small_bits | ||
| ) |
A batch multiplication method for the BN254 curve. This method is only available if Fr == field_t<barretenberg::fr>
big_points : group elements we will multiply by full 254-bit scalar multipliers big_scalars : 254-bit scalar multipliers. We want to compute (\sum big_scalars[i] * big_points[i]) small_points : group elements we will multiply by short scalar mutipliers whose max value will be (1 << max_num_small_bits) small_scalars : short scalar mutipliers whose max value will be (1 << max_num_small_bits) max_num_small_bits : MINIMUM value must be 128 bits (we will be splitting big_scalars into two 128-bit scalars, we assume all scalars after this transformation are 128 bits)
Split big scalars into short 128-bit scalars.
For big_scalars we use the BN254 curve endomorphism to split the scalar into two short 128-bit scalars. i.e. for scalar multiplier k we derive 128-bit values k1, k2 where: k = k1 - k2 * \lambda (\lambda is the cube root of unity modulo the group order of the BN254 curve)
This ensures ALL our scalar multipliers can now be treated as 128-bit scalars, which halves the number of iterations of our main "double and add" loop!
Compute batch_lookup_table
batch_lookup_table implements a lookup table for a vector of points.
We subdivide batch_lookup_table into a set of 3-bit lookup tables, (using 2-bit and 1-bit tables if points.size() is not a multiple of 8)
We index the lookup table using a vector of NAF values for each point
e.g. for points P_1, .., P_N and naf values s_1, ..., s_n (where S_i = +1 or -1), the lookup table will compute:
\sum_{i=0}^n (s_i ? -P_i : P_i)
Compute scalar multiplier NAFs
A Non Adjacent Form is a representation of an integer where each 'bit' is either +1 OR -1, i.e. each bit entry is non-zero. This is VERY useful for biggroup operations, as this removes the need to conditionally add points depending on whether the scalar mul bit is +1 or 0 (instead we multiply the y-coordinate by the NAF value, which is cheaper)
The vector naf_entries tracks the naf set for each point, where each naf set is a vector of bools if naf[i][j] = 0 this represents a NAF value of -1 if naf[i][j] = 1 this represents a NAF value of +1
Initialize accumulator point with an offset generator. See compute_offset_generators for detailed explanation
Get the initial entry of our point table. This is the same as point_table.get_accumulator for the most significant NAF entry. HOWEVER, we know the most significant NAF value is +1 because our scalar muls are positive. get_initial_entry handles this special case as it's cheaper than point_table.get_accumulator
Main "double and add" loop
Each loop iteration traverses TWO bits of our scalar multiplier. Algorithm performs following:
2*i - 1 for each scalar multiplier and store in nafs vector.nafs vector to derive the point that we need (add_1) to add into our accumulator.2 * i (add_2)accumulator = 4 * accumulator + 2 * add_1 + add_2 using multiple_montgomery_ladder methodThe purpose of the above is to minimize the number of required range checks (vs a simple double and add algo).
When computing repeated iterations of the montgomery ladder algorithm, we can neglect computing the y-coordinate of each ladder output. See multiple_montgomery_ladder for more details.
Recovering a point from our point table requires group additions iff the table is >3 bits. We can chain repeated add operations together without computing the y-coordinate of intermediate addition outputs.
This is represented using the chain_add_accumulator type. See the type declaration for more details
(this is cheaper than regular additions iff point_table.get_accumulator require 2 or more point additions. Cost is the same as point_table.get_accumulator if 1 or 0 point additions are required)
Handle skew factors.
We represent scalar multipliers via Non Adjacent Form values (NAF). In a NAF, each bit value is either -1 or +1. We use this representation to avoid having to conditionally add points (i.e. every bit we iterate over will result in either a point addition or subtraction, instead of conditionally adding a point into an accumulator, we conditionally negate the point's y-coordinate and always add it into the accumulator)
However! The problem here is that we can only represent odd integers with a NAF. For even integers we add +1 to the integer and set that multiplier's skew value to true.
We record a scalar multiplier's skew value at the end of their NAF values (naf_entries[point_index][num_rounds])
If the skew is true, we must subtract the original point from the accumulator.
| element< C, Fq, Fr, G > proof_system::plonk::stdlib::element< Builder, Fq, Fr, NativeGroup >::bn254_endo_batch_mul_with_generator | ( | const std::vector< element< Builder, Fq, Fr, NativeGroup > > & | big_points, |
| const std::vector< Fr > & | big_scalars, | ||
| const std::vector< element< Builder, Fq, Fr, NativeGroup > > & | small_points, | ||
| const std::vector< Fr > & | small_scalars, | ||
| const Fr & | generator_scalar, | ||
| const size_t | max_num_small_bits | ||
| ) |
Perform a multi-scalar multiplication over the BN254 curve
The inputs are:
big_scalars/big_points : 254-bit scalar multipliers (hardcoded to be 4 at the moment) small_scalars/small_points : 128-bit scalar multipliers generator_scalar : a 254-bit scalar multiplier over the bn254 generator point
|
static |
We compute the following terms:
lambda = acc.lambda_prev * (acc.x1_prev - acc.x) - acc.y1_prev - p1.y / acc.x - p1.x x3 = lambda * lambda - acc.x - p1.x
Requires only 2 non-native field reductions
|
static |
End an addition chain. Produces a full output group element with a y-coordinate
|
static |
We can chain repeated point additions together, where we only require 2 non-native field multiplications per point addition, instead of 3
Evaluate a chain addition!
When adding a set of points P_1 + ... + P_N, we do not need to compute the y-coordinate of intermediate addition terms.
i.e. we substitute acc.y with acc.y = acc.lambda_prev * (acc.x1_prev - acc.x) - acc.y1_prev
lambda_prev, x1_prev, y1_prev are the lambda, x1, y1 terms from the previous addition operation.
chain_add requires 1 less non-native field reduction than a regular add operation. begin a chain of additions input points p1 p2 output accumulator = x3_prev (output x coordinate), x1_prev, y1_prev (p1), lambda_prev (y2 - y1) / (x2 - x1)
|
static |
Split a secp256k1 Fr element into two 129 bit scalars klo, khi, where scalar = klo + \lambda * khi mod n where \lambda is the cube root of unity mod n, and n is the secp256k1 Fr modulus
We return the wnaf representation of the two 129-bit scalars
The wnaf representation includes positive_skew and negative_skew components, because for both klo, khi EITHER k < 2^{129} OR -k mod n < 2^{129}. If we have to negate the short scalar, the wnaf skew component flips sign.
Outline of algorithm:
We will use our wnaf elements to index a ROM table. ROM index values act like regular array indices, i.e. start at 0, increase by 1 per index. We need the wnaf format to follow the same structure.
The mapping from wnaf value to lookup table point is as follows (example is 4-bit WNAF):
| wnaf witness value | wnaf real value | point representation |
|---|---|---|
| 0 | -15 | -15.[P] |
| 1 | -13 | -13.[P] |
| 2 | -11 | -11.[P] |
| 3 | -9 | -9.[P] |
| 4 | -7 | -7.[P] |
| 5 | -5 | -5.[P] |
| 6 | -3 | -3.[P] |
| 7 | -1 | -1.[P] |
| 8 | 1 | 1.[P] |
| 9 | 3 | 3.[P] |
| 10 | 5 | 5.[P] |
| 11 | 7 | 7.[P] |
| 12 | 9 | 9.[P] |
| 13 | 11 | 11.[P] |
| 14 | 13 | 13.[P] |
| 15 | 15 | 15.[P] |
| -----------------— | --------------— | -------------------— |
The transformation between the wnaf witness value w and the wnaf real value v is, for an s-bit window:
s
v = 2.w - (2 - 1)
To reconstruct the 129-bit scalar multiplier x from wnaf values w (starting with most significant slice):
m
___
\ / s \ s.(m - i - 1)
x = positive_skew - negative_skew + | | 2.w - (2 - 1) | . 2
/___ \ i /
i=0
N.B. m = number of rounds = (129 + s - 1) / s
We can split the RHS into positive and negative components that are strictly positive:
m
___
\ / \ s.(m - i - 1)
x_pos = positive_skew + | |2.w | . 2
/___ \ i/
i=0
m
___
\ / s \ s.(m - i - 1)
x_neg = negative_skew + | |(2 - 1)| . 2
/___ \ /
i=0
By independently constructing x_pos, x_neg, we ensure we never underflow the native circuit modulus
To reconstruct our wnaf components into a scalar, we perform the following (for each 129-bit slice klo, khi):
1. Compute the wnaf entries and range constrain each entry to be < 2^s 2. Construct `x_pos` 3. Construct `x_neg` 4. Cast `x_pos, x_neg` into two Fr elements and compute `Fr reconstructed = Fr(x_pos) - Fr(x_neg)`
This ensures that the only negation in performed in the Fr representation, removing the risk of underflow errors
Once klo, khi have been reconstructed as Fr elements, we validate the following:
1. `scalar == Fr(klo) - Fr(khi) * Fr(\lambda)
Finally, we return the wnaf representations of klo, khi including the skew
The staggered offset describes the number of bits we want to remove from the input scalar before computing our wnaf slices. This is to enable us to make repeated calls to the montgomery ladder algo when computing a multi-scalar multiplication e.g. Consider an example with 2 points (A, B), using a 2-bit WNAF The typical approach would be to perfomr a double-and-add algorithm, adding points into an accumulator ACC:
ACC = ACC.dbl() ACC = ACC.dbl() ACC = ACC.add(A) ACC = ACC.add(B)
However, if the A and B WNAFs are offset by 1 bit each, we can perform the following:
ACC = ACC.dbl() ACC = ACC.add(A) ACC = ACC.dbl() ACC = ACC.add(B)
which we can reduce to:
ACC = ACC.montgomery_ladder(A) ACC = ACC.montgomery_ladder(B)
This is more efficient than the non-staggered approach as we save 1 non-native field multiplication when we replace a DBL, ADD subroutine with a call to the montgomery ladder
Compute WNAF of a single 129-bit scalar
| k | Scalar |
| stagger | The number of bits that are used in "staggering" |
| is_negative | If it should be subtracted |
| is_lo | True if it's the low scalar |
Compute the stagger-related part of WNAF and the final skew
| fragment_u64 | Stagger-masked lower bits of the skalar |
| stagger | The number of staggering bits |
| is_negative | If the initial scalar is supposed to be subtracted |
| wnaf_skew | The skew of the stagger-right-shifted part of the skalar |
Compute wnaf values, convert them into witness field elements and range constrain them
| std::array< twin_rom_table< C >, 5 > proof_system::plonk::stdlib::element< Builder, Fq, Fr, NativeGroup >::create_group_element_rom_tables | ( | const std::array< element< Builder, Fq, Fr, NativeGroup >, num_elements > & | rom_data, |
| std::array< uint256_t, 8 > & | limb_max | ||
| ) |
Constructs a ROM table to look up linear combinations of group elements.
| C | |
| Fq | |
| Fr | |
| G | |
| num_elements | |
| typename |
| rom_data | the ROM table we are writing into |
| limb_max | the maximum size of each limb in the ROM table. |
When reading a group element out of the ROM table, we must know the maximum value of each coordinate's limbs. We take this value to be the maximum of the maximum values of the input limbs into the table!
|
static |
Goblin style batch multiplication.
In goblin-style arithmetization, the operands (points/scalars) for each mul-accumulate operation are decomposed into smaller components and written to an operation queue via the builder. The components are also added as witness variables. This function adds constraints demonstrating the fidelity of the point/scalar decompositions given the indices of the components in the variables array. The actual mul-accumulate operations are performed natively (without constraints) under the hood, and the final result is obtained by queueing an equality operation via the builder. The components of the result are returned as indices into the variables array from which the resulting accumulator point is re-constructed.
| C | CircuitBuilder |
| Fq | Base field |
| Fr | Scalar field |
| G | Native group |
| points | |
| scalars | |
| max_num_bits |
| element< C, Fq, Fr, G > proof_system::plonk::stdlib::element< C, Fq, Fr, G >::montgomery_ladder | ( | const chain_add_accumulator & | to_add | ) |
Implementation of montgomery_ladder using chain_add_accumulator.
If the input to montgomery_ladder is the output of a chain of additions, we can avoid computing the y-coordinate of the input to_add, which saves us a non-native field reduction.
We substitute to_add.y with lambda_prev * (to_add.x1_prev - to_add.x) - to_add.y1_prev
Here, x1_prev, y1_prev, lambda_prev are the values of x1, y1, lambda for the addition operation that PRODUCED to_add
The reason why this saves us gates, is because the montgomery ladder does not multiply to_add.y by any values i.e. to_add.y is only used in addition operations
This allows us to substitute to_add.y with the above relation without requiring additional field reductions
e.g. the term (lambda * (x3 - x1) + to_add.y) remains "quadratic" if we replace to_add.y with the above quadratic relation
| element< C, Fq, Fr, G > proof_system::plonk::stdlib::element< C, Fq, Fr, G >::montgomery_ladder | ( | const element< Builder, Fq, Fr, NativeGroup > & | other | ) | const |
Compute one round of a Montgomery ladder: i.e. compute 2 * (*this) + other Compute D = A + B + A, where A = this and B = other
We can skip computing the y-coordinate of C = A + B:
To compute D = A + C, A=(x_1,y_1), we need the gradient of our two coordinates, specifically:
y_3 - y_1 lambda_1 * (x_1 - x_3) - 2 * y_1 2 * y_1
lambda_2 = __________ = ________________________________ = -\lambda_1 - _________ x_3 - x_1 x_3 - x_1 x_3 - x_1
We don't need y_3 to compute this. We can then compute D.x and D.y as usual:
D.x = lambda_2 * lambda_2 - (C.x + A.x) D.y = lambda_2 * (A.x - D.x) - A.y
Requires 5 non-native field reductions. Doubling and adding would require 6 Compute D = A + B + A, where A = this and B = other
We can skip computing the y-coordinate of C = A + B:
To compute D = A + C, A=(x_1,y_1), we need the gradient of our two coordinates, specifically:
y_3 - y_1 lambda_1 * (x_1 - x_3) - 2 * y_1 2 * y_1
lambda_2 = __________ = ________________________________ = -\lambda_1 - _________ x_3 - x_1 x_3 - x_1 x_3 - x_1
We don't need y_3 to compute this. We can then compute D.x and D.y as usual:
D.x = lambda_2 * lambda_2 - (C.x + A.x) D.y = lambda_2 * (A.x - D.x) - A.y
| element< C, Fq, Fr, G > proof_system::plonk::stdlib::element< C, Fq, Fr, G >::multiple_montgomery_ladder | ( | const std::vector< chain_add_accumulator > & | add | ) | const |
Perform repeated iterations of the montgomery ladder algorithm.
For points P, Q, montgomery ladder computes R = (P + Q) + P i.e. it's "double-and-add" without explicit doublings.
This method can apply repeated iterations of the montgomery ladder. Each iteration reduces the number of field multiplications by 1, at the cost of more additions. (i.e. we don't compute intermediate y-coordinates).
The number of additions scales with the size of the input vector. The optimal input size appears to be 4.
| C | |
| Fq | |
| Fr | |
| G |
| add |
| element< C, Fq, Fr, G > proof_system::plonk::stdlib::element< C, Fq, Fr, G >::operator* | ( | const Fr & | scalar | ) | const |
Implements scalar multiplication.
For multiple scalar multiplication use one of the batch_mul methods to save gates.
Let's say we have some curve E defined over a field Fq. The order of E is p, which is prime.
Now lets say we are constructing a SNARK circuit over another curve E2, whose order is r.
All of our addition / multiplication / custom gates are going to be evaluating low degree multivariate polynomials modulo r.
E.g. our addition/mul gate (for wires a, b, c and selectors q_m, q_l, q_r, q_o, q_c) is:
q_m * a * b + q_l * a + q_r * b + q_o * c + q_c = 0 mod r
We want to construct a circuit that evaluates scalar multiplications of curve E. Where q > r and p > r.
i.e. we need to perform arithmetic in one prime field, using prime field arithmetic in a completely different prime field.
To do this, we need to emulate a binary (or in our case quaternary) number system in Fr, so that we can use the binary/quaternary basis to emulate arithmetic in Fq. Which is very messy. See bigfield.hpp for the specifics.
| element< C, Fq, Fr, G > proof_system::plonk::stdlib::element< C, Fq, Fr, G >::quadruple_and_add | ( | const std::vector< element< Builder, Fq, Fr, NativeGroup > > & | to_add | ) | const |
Compute 4.P + to_add[0] + ... + to_add[to_add.size() - 1].
Used in wnaf_batch_mul method. Combining operations requires fewer bigfield reductions.
Method computes R[i] = (2P + A[0]) + (2P + A[1]) + A[2] + ... + A[n-1]
| C | |
| Fq | |
| Fr | |
| G |
| to_add |
| element< C, Fq, Fr, G > proof_system::plonk::stdlib::element< Builder, Fq, Fr, NativeGroup >::secp256k1_ecdsa_mul | ( | const element< Builder, Fq, Fr, NativeGroup > & | pubkey, |
| const Fr & | u1, | ||
| const Fr & | u2 | ||
| ) |
Compute `out = u1.[1] + u2.[pubkey]
Split scalar u1 into 129-bit short scalars u1_lo, u1_hi, where u1 = u1_lo * \lambda u1_hi (\lambda is the cube root of unity modulo the secp256k1 group order)
Covert u1_lo and u1_hi into an 8-bit sliding window NAF. Our base point is the G1 generator. We have a precomputed size-256 plookup table of the generator point, multiplied by all possible wNAF values
We also split scalar u2 using the secp256k1 endomorphism. Convert short scalars into 4-bit sliding window NAFs. We will store the lookup table of all possible base-point wNAF states in a ROM table (it's variable-base scalar multiplication in a SNARK with a lookup table! ho ho ho)
The wNAFs u1_lo_wnaf, u1_hi_wnaf, u2_lo_wnaf, u2_hi_wnaf are each offset by 1 bit relative to each other. i.e. we right-shift u2_hi by 1 bit before computing its wNAF we right-shift u1_lo by 2 bits we right-shift u1_hi by 3 bits we do not shift u2_lo
We do this to ensure that we are never adding more than 1 point into our accumulator when performing our double-and-add scalar multiplication. It is more efficient to use the montgomery ladder algorithm, compared against doubling an accumulator and adding points into it.
The bits removed by the right-shifts are stored in the wnaf's respective least_significant_wnaf_fragment member variable
Construct our 4-bit variable-base and 8-bit fixed base lookup tables
main double-and-add loop
Acc = Acc + Acc Acc = Acc + Acc Acc = Acc + u2_hi_wnaf.[endoP2] + Acc Acc = Acc + u2_lo_wnaf.[P2] + Acc Acc = Acc + u1_hi_wnaf.[endoP1] + Acc Acc = Acc + u1_lo_wnaf.[P1] + Acc Acc = Acc + u2_hi_wnaf.[endoP2] + Acc Acc = Acc + u2_lo_wnaf.[P2] + Acc
We add u2 points into the accumulator twice per 'round' as we only have a 4-bit lookup table (vs the 8-bit table for u1)
At the conclusion of this loop, we will need to add a final contribution from u2_hi, u1_lo, u1_hi. This is because we offset our wNAFs to take advantage of the montgomery ladder, but this means we have doubled our accumulator AFTER adding our final wnaf contributions from u2_hi, u1_lo and u1_hi
Add the final contributions from u2_hi, u1_lo, u1_hi
Handle wNAF skew.
scalars represented via the non-adjacent form can only be odd. If our scalars are even, we must either add or subtract the relevant base point into the accumulator
| element< C, Fq, Fr, G > proof_system::plonk::stdlib::element< Builder, Fq, Fr, NativeGroup >::wnaf_batch_mul | ( | const std::vector< element< Builder, Fq, Fr, NativeGroup > > & | points, |
| const std::vector< Fr > & | scalars | ||
| ) |
only works for Plookup (otherwise falls back on batch_mul)! Multiscalar multiplication that utilizes 4-bit wNAF lookup tables is more efficient than points-as-linear-combinations lookup tables, if the number of points is 3 or fewer