-
Notifications
You must be signed in to change notification settings - Fork 6
Podlog language v1 #225
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Podlog language v1 #225
Conversation
Some thoughts after reading the PR description (I haven't reviewed the code yet)
This would apply in generals to literal values that appear in templates. Another example would be a request that involves a statement to show that someone is over 18 years old. To do that we need to calculate a value that is 18 years in the past from now, and this value will change every time we do a request; so being able to pass this value externally would be useful. Questions
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've finished reviewing this PR and I'm marking request changes because there are significant changes to the approach of this PR that I want to discuss.
src/lang/error.rs
Outdated
#[derive(Error, Debug)] | ||
pub enum LangError { | ||
#[error("Parsing failed: {0}")] | ||
Parse(#[from] Box<ParseError>), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This #[from]
is generating the code for impl From<Box<ParseError>> for LangError
which I think is not used anywhere. And the same applies to the other cases.
I suggest removing these #[from]
tags for all error cases because you already implement the From
manually for the type you want.
src/lang/parser.rs
Outdated
#[derive(thiserror::Error, Debug)] | ||
pub enum ParseError { | ||
#[error("Pest parsing error: {0}")] | ||
Pest(#[from] Box<pest::error::Error<Rule>>), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest removing the #[from]
. See my other comment on the other error type.
src/lang/processor.rs
Outdated
let hex_str = hex::encode(&bytes); | ||
let padded_hex_str = format!("{:0>64}", hex_str); | ||
parse_hex_to_raw_value(&padded_hex_str).map(Value::from) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks over complicated. You don't need to encode the bytes into a hex string to do that.
A RawValue is 4 goldiclocks field elements, which can be represented with 64 bits each.
You just need to build each goldilock field element with 64/8 = 8 bytes from the bytes
vector.
Also you need to make a decision on the endianness of the bytes
vector (and please document the endianness in the grammar).
Also to simplify things I would change the grammar to force the literal_raw to be 32 bytes (or 64 hex characters).
Finally: the goldilocks field is smaller than 2^64-1. This means that some values have aliases, for example 1
and p-1
fit into 64 bits and are the same value but look different when encoded into bytes. I suggest that the parser checks that each array of bytes corresponding to a field element represents an integer that is smaller than p
. The method GoldilocksField::from_canonical_u64
performs this check but only in debug mode, not in release. We should check n < Self::GoldilocksField
and return an error if that's not the case.
b070e73
to
6757d8d
Compare
span: Option<(usize, usize)>, | ||
}, | ||
#[error("Frontend error: {0}")] | ||
Frontend(frontend::Error), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small suggestion which allows removing the impl From<frontend::Error> for ProcessorError
Frontend(frontend::Error), | |
Frontend(#[from] frontend::Error), |
src/lang/processor.rs
Outdated
let name_only = | ||
full_name | ||
.strip_prefix("?") | ||
.ok_or_else(|| ProcessorError::Semantic { | ||
message: format!("Invalid wildcard format for BuilderArg: {}", full_name), | ||
span: Some(get_span(arg_content_pair)), | ||
})?; | ||
if name_only.is_empty() { | ||
return Err(ProcessorError::Semantic { | ||
message: "Wildcard name cannot be empty after '?' for BuilderArg".to_string(), | ||
span: Some(get_span(arg_content_pair)), | ||
}); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here you're doing two checks:
Make sure there's a ?
prefix and making sure the identifier is not empty. But the grammar already enforces this!
So by the definition of the grammar, these two errors will never occur.
More over, if they were to happen, it would be a bug in the code because either pest is not returning a parsing error, or the grammar is not well defined. For those cases I strongly suggest that we panic instead, and we can do this cleanly with expect
.
Please take a look at this commit where I've made some changes to simplify this following this idea:
b806322
I think there are a few more places where this changes can be applied. The only place where we need to do some checking and do proper error reporting is when handling native predicates because their argument types are not validated by the grammar.
But any time we find a rule defined in the grammar, we know that the tokens defined by the grammar will be there. For example, every time we find the Rule::anchored_key
we know for sure that we'll encounter 2 Pairs. The first one will either be self_keyword
or wildcard
; and the second one will either be wildcard
or literal_string
. Any other case is not expected to happen on any kind of user input.
Ok(result) | ||
} | ||
|
||
fn parse_hex_str_to_raw_value(hex_str: &str) -> Result<middleware::RawValue, ProcessorError> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just realized we already had this functionality here:
pod2/src/middleware/basetypes.rs
Lines 209 to 218 in d3fef83
fn from_hex<T: AsRef<[u8]>>(hex: T) -> Result<Self, Self::Error> { | |
// In little endian | |
let bytes = <[u8; 32]>::from_hex(hex)?; | |
let mut buf: [u8; 8] = [0; 8]; | |
let mut inner = [F::ZERO; HASH_SIZE]; | |
for i in 0..HASH_SIZE { | |
buf.copy_from_slice(&bytes[8 * i..8 * (i + 1)]); | |
inner[i] = F::from_canonical_u64(u64::from_le_bytes(buf)); | |
} | |
Ok(Self(inner)) |
It's used to deserialize a Hash
from a hex string, and internally it creates the 4 field elements, which is the same we need to get a RawValue. Then you can do RawValue::from(h: Hash)
.
The encoding is different though, and it doesn't use a 0x
prefix.
The encoding uses little endian, and taking a little endian hex encoded value and adding a 0x
prefix is confusing from my point of view.
} | ||
Rule::EOI => break, | ||
Rule::COMMENT | Rule::WHITESPACE => {} | ||
_ => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We call first_pass
after checking we're in a document
, and by definition there are no more rules in document
, so this should be unreachable
instead of error
SelfOrWildcardStr::Wildcard(name.to_string()) | ||
} | ||
Rule::self_keyword => SelfOrWildcardStr::SELF, | ||
_ => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the anchored_key
rule by grammar definition can only wildcard
or self_keyword
for the first pair, so this should be unreachable
.
let key_str_literal = parse_pest_string_literal(&key_part_pair)?; | ||
KeyOrWildcardStr::Key(key_str_literal) | ||
} | ||
_ => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the second pair in anchored_key
by definition can only be wildcard
or literal_string
, so this should be unreachable
.
}; | ||
Ok(BuilderArg::Key(pod_self_or_wc_str, key_or_wildcard_str)) | ||
} | ||
_ => Err(ProcessorError::RuleMismatch { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we call pest_pair_to_builder_arg
after checking that the rule is statement_arg
and we've already checked all possible pair rules, so this should be unreachable
.
} | ||
Rule::private_kw | Rule::COMMENT | Rule::WHITESPACE => {} | ||
_ if arg_part_pair.as_str() == "," => {} | ||
_ => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The arg_section
rule by grammar definition doesn't have more cases, so this should be unreachable
.
let conjunction = match conjunction_type_pair.as_str() { | ||
"AND" => true, | ||
"OR" => false, | ||
_ => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By grammar definition conjunction_type
is only AND
or OR
, so this should be unreachable.
})?; | ||
Ok(Value::from(middleware_dict)) | ||
} | ||
_ => Err(ProcessorError::RuleMismatch { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We call process_literal_value
after checking that the pair rule is literal_value
which doesn't have more cases by definition, so this should be unreachable
.
This adds a simple DSL for specifying custom predicates and proof requests. It is similar to the language we have been using informally in our docs. The code includes tests, and should be capable of supporting all of our current predicates, with the exception of
SetContains
/SetNotContains
. A language reference is included below.Example
A proof request for ZuKYC:
Ethdos custom predicates:
Code structure
The grammar is implemented using Pest, which is able to derive a basic parser implementation from a grammar file. We have tests for this in
parser.rs
. The parsed input is then turned into anAST
by the code inast.rs
. This closely mirrors the internal POD2 data structures, but may diverge in some ways (for example, by allowing certain pieces of syntactic sugar). Finally the AST is turned into POD2 internal data structures inprocessor.rs
. There are end-to-end tests inmod.rs
.Outstanding issues
Currently
SetContains
andSetNotContains
are unsupported, as de-sugaring these to lower-level templates might require some changes inMainPodBuilder
or elsewhere. See #224Next steps
If we're happy with the grammar here, I will update the code examples in the book to use this syntax.
Possible future features
A few features have been considered and are currently out-of-scope, but could be included in future:
Multiple custom predicate batches
For a given Podlog document, all custom predicates will be included in the same batch. We might want to give the user more control by allowing them to specify separate batches, e.g.:
In this case, the call to
eth_friend
would be made via aCustomPredicateRef
rather than aBatchSelf
.Import syntax
It might be helpful to be able to import custom predicates from different files. For example:
Given that each custom predicate has a unique hash, we could also imagine a content-addressable store of custom predicates:
Language reference
Language reference
Podlog Language Syntax
Podlog is a declarative language used for defining custom proof logic and requests within the POD2 ecosystem. It allows for the specification of statement templates that can be evaluated by a prover.
Table of Contents
Overview
A Podlog document consists of zero or more top-level definitions, which can be either custom predicate definitions or a single request definition. The language is designed to be human-readable while providing precise instructions for the POD2 prover.
Comments
Single-line comments begin with
//
. All text from//
to the end of the line is ignored.Identifiers
Identifiers are used for naming custom predicates and for the non-prefixed part of variable names.
_
).private
,REQUEST
,AND
,OR
, and native predicate names (e.g.,Equal
,ValueOf
) cannot be used as predicate identifiers.private
cannot be used as an identifier (e.g., for variable names).Examples:
my_predicate
_internal_check
isValidUser1
Invalid:
123starts_with_digit
has-hyphen
private
(as a predicate or variable name component)Equal
(as a predicate name)Variables
Variables represent placeholders for values that the prover will attempt to bind.
?
).?
must adhere to the rules for Identifiers.Examples:
?UserPod
?publicKey
?_tempValue
Literals
Literals represent fixed values.
Integer
A sequence of digits, optionally prefixed with a
-
for negative numbers.Maps to
middleware::Value::Int
.Examples:
123
,-45
,0
String
Characters enclosed in double quotes (
"
).Standard JSON-like escaping rules apply for special characters:
\"
for double quote\\
for backslash/
for forward slash (optional,/
can be unescaped)\b
for backspace\f
for form feed\n
for newline\r
for carriage return\t
for tab\uXXXX
for a Unicode character specified by 4 hexadecimal digits.Maps to
middleware::Value::String
.Examples:
"hello"
,"user:name"
,"escaped \\\" quote"
,"\u0041"
Raw (Bytes)
A hexadecimal string prefixed with
0x
. Must contain an even number of hex digits (0-9, a-f, A-F) and at least two digits.Maps to
middleware::Value::Raw
.Examples:
0xdeadbeef
,0x00
,0x0102030405060708090a0b0c0d0e0f10
Invalid:
0xabc
(odd length),0x
(empty),0xGG
(invalid hex chars)Boolean
The keywords
true
orfalse
.Maps to
middleware::Value::Bool
.Examples:
true
,false
Array
A comma-separated list of literals enclosed in square brackets (
[]
). Elements can be of mixed types.Maps to
middleware::Value::Array
.Examples:
[]
,[1, "two", true]
,[ [10, 20], #["a", "b"] ]
Set
A comma-separated list of literals enclosed in square brackets prefixed with a hash (
#[]
). Elements can be of mixed types. The processor will validate uniqueness if required by the underlying middleware.Maps to
middleware::Value::Set
.Examples:
#[]
,#[1, 2, "apple"]
,#[ 0x01, 0x02 ]
Dictionary
A comma-separated list of
string_key: literal_value
pairs enclosed in curly braces ({}
). Keys must be string literals.Maps to
middleware::Value::Dictionary
.Examples:
{}
,{ "name": "Alice", "age": 30 }
,{ "data": [1,2,3], "metadata": { "version": "1.0" } }
Anchored Keys
Anchored keys are used to refer to values within a POD (represented by a variable).
Syntax:
?PodVariable["key_literal"]
or?PodVariable[?KeyVariable]
?PodVariable
: A Variable representing a POD."key_literal"
: A String Literal representing a fixed key.?KeyVariable
: A Variable representing a key to be resolved at runtime.Examples:
?UserPod["name"]
?DataPod[?fieldName]
Top-Level Definitions
A Podlog document can contain multiple custom predicate definitions and at most one request definition.
Custom Predicate Definition
Defines a reusable piece of logic.
Syntax:
predicate_name(public_arg1, public_arg2, ..., private: private_arg1, private_arg2, ...) = CONJUNCTION_TYPE( statement1 statement2 ... )
predicate_name
: An Identifier for the predicate., private:
and then a comma-separated list of private argument names (identifiers).pred(private: PrivA) = ...
), or no arguments (e.g.,pred() = ...
).?public_arg1
) within the predicate's scope.CONJUNCTION_TYPE
: EitherAND
orOR
.AND
: All statements in the body must be true.OR
: At least one statement in the body must be true.statement_list
: One or more Statements enclosed in parentheses.Examples:
Request Definition
Defines the main set of statements to be proven by the prover. There can be at most one
REQUEST
block in a document.Syntax:
REQUEST( statement1 statement2 ... )
statement_list
: Optional. If present, one or more Statements enclosed in parentheses. Variables used here are implicitly defined and their bindings are sought by the prover.Example:
Statements
Statements are the core assertions or operations within custom predicate definitions and request blocks.
Native Predicate Calls
These calls invoke built-in operations provided by the POD2 middleware. Arguments to native predicates are typically Anchored Keys, except for
ValueOf
which takes an Anchored Key and a Literal.Core Predicates (Directly map to middleware operations):
ValueOf(?PodVar["key"], LiteralValue)
?PodVar["key"]
is equal toLiteralValue
.ValueOf(?Config["port"], 8080)
Equal(?PodVar1["key1"], ?PodVar2["key2"])
Equal(?User["id"], ?Order["userId"])
NotEqual(?PodVar1["key1"], ?PodVar2["key2"])
NotEqual(?Account["status"], "suspended")
Lt(?PodVar1["key1"], ?PodVar2["key2"])
value(?PodVar1["key1"]) < value(?PodVar2["key2"])
.Lt(?Event["timestamp"], ?CutoffTime["value"])
LtEq(?PodVar1["key1"], ?PodVar2["key2"])
value(?PodVar1["key1"]) <= value(?PodVar2["key2"])
.LtEq(?User["age"], ?MaxAge["limit"])
Contains(?PodVar[?Container], ?PodVar2[?Key], ?PodVar2[?Value])
?Value
at a location specified by?Key
. In most cases, users should useDictContains
,SetContains
, orArrayContains
based on the expected type of the container.Contains(?Data["record"], ?User["nameKey"], ?User["expectedName"])
NotContains(?PodVar[?Container], ?PodVar2[?Key])
?Key
is not present in?Container
.NotContains(?Banlist["users"], ?User["id"])
SumOf(?Target["keyT"], ?Source1["keyS1"], ?Source2["keyS2"])
?Target["keyT"]
equals the sum of the values at?Source1["keyS1"]
and?Source2["keyS2"]
.SumOf(?Totals["all"], ?SubTotals["A"], ?SubTotals["B"])
ProductOf(?Target["keyT"], ?Source1["keyS1"], ?Source2["keyS2"])
?Target["keyT"]
equals the product of the values at?Source1["keyS1"]
and?Source2["keyS2"]
.ProductOf(?Area["rect"], ?Dimensions["width"], ?Dimensions["height"])
MaxOf(?Target["keyT"], ?Source1["keyS1"], ?Source2["keyS2"])
?Target["keyT"]
is equal to either of the values at?Source1["keyS1"]
or?Source2["keyS2"]
, whichever is higher.MaxOf(?Score["high"], ?Player1["score"], ?Player2["score"])
HashOf(?Target["keyT"], ?Source1["keyS1"], ?Source2["keyS2"])
value(?Target["keyT"])
is the hash ofvalue(?Source1["keyS1"])
andvalue(?Source2["keyS2"])
.HashOf(?User["computedId"], ?User["email"], ?User["salt"])
Syntactic Sugar Predicates (Translated by the processor):
These provide more convenient syntax for common operations and are translated into the core middleware predicates.
Gt(?PodVar1["key1"], ?PodVar2["key2"])
value(?PodVar1["key1"]) > value(?PodVar2["key2"])
.Lt(?PodVar2["key2"], ?PodVar1["key1"])
(arguments are swapped).Gt(?User["score"], 1000)
GtEq(?PodVar1["key1"], ?PodVar2["key2"])
value(?PodVar1["key1"]) >= value(?PodVar2["key2"])
.LtEq(?PodVar2["key2"], ?PodVar1["key1"])
(arguments are swapped).GtEq(?Balance["amount"], ?Minimum["required"])
DictContains(?DictVar["dict_key"], ?KeyToFind["key_name"], ?ExpectedValue["val_name"])
value(?DictVar["dict_key"][?KeyToFind["key_name"]]) == value(?ExpectedValue["val_name"])
.Contains(?DictVar["dict_key"], ?KeyToFind["key_name"], ?ExpectedValue["val_name"])
(or similar, depending on exact mapping to generic Contains).DictContains(?UserPod["attributes"], ?_KeyNamePod["name"], ?_ExpectedValuePod["admin"])
?_KeyNamePod["name"]
might resolve to a literal key like "role" and?_ExpectedValuePod["admin"]
to "admin".DictNotContains(?DictVar["dict_key"], ?KeyToFind["key_name"])
?KeyToFind["key_name"]
is not a key invalue(?DictVar["dict_key"])
.NotContains(?DictVar["dict_key"], ?KeyToFind["key_name"])
.DictNotContains(?UserPod["flags"], ?_FlagNamePod["is_banned"])
ArrayContains(?ArrayVar["array_key"], ?IndexVar["idx_name"], ?ExpectedValue["val_name"])
value(?ArrayVar["array_key"][?IndexVar["idx_name"]]) == value(?ExpectedValue["val_name"])
.Contains(?ArrayVar["array_key"], ?IndexVar["idx_name"], ?ExpectedValue["val_name"])
.ArrayContains(?LogPod["entries"], ?_IndexPod["target_idx"], ?_ValuePod["error_code_X"])
(Note:
SetContains
andSetNotContains
are currently missing, pending an implementation for de-sugaring.)Custom Predicate Calls
Calls a previously defined custom predicate.
Syntax:
predicate_name(arg1, arg2, ...)
predicate_name
: The Identifier of the custom predicate to call.arg_list
: A comma-separated list of arguments. Arguments can be Variables or Literals. The number and type of arguments must match the public argument definition of the called predicate.Example: