-
Notifications
You must be signed in to change notification settings - Fork 606
Add support for table valued functions for SQL Server #1839
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
3686b1b
to
db1c9b2
Compare
CREATE FUNCTION some_inline_tvf(@foo INT, @bar VARCHAR(256)) \ | ||
RETURNS TABLE \ | ||
AS \ | ||
RETURN (SELECT 1 AS col_1)\ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Parentheses are optional for inline tvf return
queries, although I think the subquery expr expects/requires them currently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added support for that syntax & added a new test case example
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
UNION
is also supported in this syntax but not in this current approach due to using parse_select, which is restricted. I'm comfortable leaving that for later
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
took a quick look and left a couple comments, @aharpervc could you rebase on main now that the other PR has landed, in order to remove the extra diff?
src/parser/mod.rs
Outdated
if self.peek_keyword(Keyword::AS) { | ||
self.expect_keyword_is(Keyword::AS)?; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if self.peek_keyword(Keyword::AS) { | |
self.expect_keyword_is(Keyword::AS)?; | |
} | |
self.parse_keyword(Keyword::AS); |
this looks like it could be simplified as above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done 👍
src/ast/data_type.rs
Outdated
if fields.is_empty() { | ||
return write!(f, "TABLE"); | ||
} | ||
write!(f, "TABLE({})", display_comma_separated(fields)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think instead of using the is_empty
we can make the fields
value an Option
and skip the parenthesis only if fields
is None
. Otherwise we could run into issues if empty fields is allowed in the other variant of this datatype
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done 👍
1b92ede
to
999964c
Compare
Done 👍 |
src/ast/data_type.rs
Outdated
NamedTable( | ||
/// Table name. | ||
ObjectName, | ||
/// Table columns. | ||
Vec<ColumnDef>, | ||
), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NamedTable( | |
/// Table name. | |
ObjectName, | |
/// Table columns. | |
Vec<ColumnDef>, | |
), | |
NamedTable { | |
/// Table name. | |
table: ObjectName, | |
/// Table columns. | |
columns: Vec<ColumnDef>, | |
}, |
we can use an anonymous struct in this manner?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, yes we can. Not sure why I did it that way. Done 👍
Table(Vec<ColumnDef>), | ||
/// [MsSQL]: https://learn.microsoft.com/en-us/sql/t-sql/statements/create-function-transact-sql?view=sql-server-ver16#c-create-a-multi-statement-table-valued-function | ||
Table(Option<Vec<ColumnDef>>), | ||
/// Table type with a name, e.g. CREATE FUNCTION RETURNS @result TABLE(...). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we add a link to the docs that support NamedTable
variant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done 👍
/// RETURNS TABLE | ||
/// AS RETURN SELECT a + b AS sum; | ||
/// ``` | ||
AsReturnSelect(Select), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we can also include a reference doc link here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done 👍
src/parser/mod.rs
Outdated
})); | ||
Ok(DataType::NamedTable( | ||
ObjectName(vec![ObjectNamePart::Identifier(return_table_name)]), | ||
table_column_defs.clone().unwrap(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we return an error instead of the unwrap here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reworked this code to address your several comments 👍
src/parser/mod.rs
Outdated
self.expect_keyword_is(Keyword::AS)?; | ||
let return_table = self.maybe_parse(|p| { | ||
let return_table_name = p.parse_identifier()?; | ||
let table_column_defs = if p.peek_keyword(Keyword::TABLE) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can replace the if/else statement here with self.expect_keyword(Keyword::TABLE)
src/parser/mod.rs
Outdated
let statements = self.parse_statement_list(&[Keyword::END])?; | ||
let end_token = self.expect_keyword(Keyword::END)?; | ||
if table_column_defs.is_none() | ||
|| table_column_defs.clone().is_some_and(|tcd| tcd.is_empty()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm is the clone()
here necessary?
src/parser/mod.rs
Outdated
let return_table_name = p.parse_identifier()?; | ||
let table_column_defs = if p.peek_keyword(Keyword::TABLE) { | ||
match p.parse_data_type()? { | ||
DataType::Table(t) => t, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think already here we can check that the returned data type is none empty, that would avoid the option invalid case propagating further below (where we have the is_none and is_some checks). e.g.
DataType::Table(Some(t)) if !t.is_empty() => t
src/parser/mod.rs
Outdated
})); | ||
Ok(DataType::NamedTable( | ||
ObjectName(vec![ObjectNamePart::Identifier(return_table_name)]), | ||
table_column_defs.clone().unwrap(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
similarly here, is the clone
necessary?
src/parser/mod.rs
Outdated
if !matches!(expr, Expr::Subquery(_)) { | ||
parser_err!( | ||
"Expected a subquery after RETURN", | ||
self.peek_token().span.start | ||
)? | ||
} | ||
Some(CreateFunctionBody::AsReturnSubquery(expr)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if !matches!(expr, Expr::Subquery(_)) { | |
parser_err!( | |
"Expected a subquery after RETURN", | |
self.peek_token().span.start | |
)? | |
} | |
Some(CreateFunctionBody::AsReturnSubquery(expr)) | |
Some(CreateFunctionBody::AsReturnExpr(expr)) |
thinking since a subquery is already an expr, we can be more permissive here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a "could" vs "should" situation. The question here is what is the parser expecting? I don't think it makes sense to parse arbitrary expr's if they wouldn't otherwise be allowed by any sql engine... that seems better as a parse failure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah ideally we would parse statements accordingy to spec but the parser is more permissive in some cases, especially in order to minimize complexity in the parser. In this case we would avoid introducing an extra AsReturnSubquery
enum variant, as well as the matches!()
check that does the validation after the fact of parsing an expression which isn't ideal. So I think we'd be better off leaving the validation to the downstream crate in this scenario
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, yes I suppose that makes sense. It seems odd for the parser to produce output if it's known to be invalid, but I suppose you're saying, the validity question is in the semantics rather than the syntax.
Loosening the implementation to any Expr would probably mean that select .. union .. select
syntax would begin parsing, which addresses my concern above.
I'll investigate...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The primary reason to have both AsReturnSubquery
and AsReturnSelect
is to distinguish between AS RETURN (SELECT a + b AS sum)
and AS RETURN SELECT a + b AS sum
.
In other PR's, such as recent work on BeginEndStatements, I believe you had recommended that with/without wrapping tokens should be distinct enum variants instead of something like present/empty AttachedTokens. Based on that guidance, it seems like it'd also fit to have two distinct enum variants here as well.
There is your additional concern regarding "subquery" vs "expr". To address this I have renamed AsReturnSubquery
to AsReturnExpr
& removed the non-subquery parser error.
So I think this concern is now resolved
src/parser/mod.rs
Outdated
if self.peek_token() != Token::LParen { | ||
Ok(DataType::Table(None)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we add a comment here explaining what the LParen check is for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, and I simplified the if/else 👍
src/parser/mod.rs
Outdated
if !matches!(expr, Expr::Subquery(_)) { | ||
parser_err!( | ||
"Expected a subquery after RETURN", | ||
self.peek_token().span.start | ||
)? | ||
} | ||
Some(CreateFunctionBody::AsReturnSubquery(expr)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah ideally we would parse statements accordingy to spec but the parser is more permissive in some cases, especially in order to minimize complexity in the parser. In this case we would avoid introducing an extra AsReturnSubquery
enum variant, as well as the matches!()
check that does the validation after the fact of parsing an expression which isn't ideal. So I think we'd be better off leaving the validation to the downstream crate in this scenario
88c5507
to
d509d4e
Compare
@iffyio concerns addressed & rebased on latest master. I'm unsure how to address the linting CI job errors and I can't reproduce the report locally, so I may need some guidance on that. |
- rename `AsReturn` to `AsReturnSubquery` for clarity between these two variants
- plus, flip the if/else to positive equality for simplicity
d509d4e
to
8bc8d38
Compare
It seems the CI failure for the lint job was addressed by #1856. I've rebased again, all good 👍 |
src/parser/mod.rs
Outdated
if !p.peek_keyword(Keyword::TABLE) { | ||
parser_err!( | ||
"Expected TABLE keyword after return type", | ||
p.peek_token().span.start | ||
)? | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if !p.peek_keyword(Keyword::TABLE) { | |
parser_err!( | |
"Expected TABLE keyword after return type", | |
p.peek_token().span.start | |
)? | |
} | |
p.expect_keyword(Keyword::TABLE)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason I didn't do it this way is because expect_keyword (and expect_keyword_is) consume the token. That causes parse_data_type to break, because it uses the TABLE keyword to understand it should parse the table data type.
However, I suppose that I could just call prev_token()
after expect to undo consuming the token. I will make this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done 👍
src/parser/mod.rs
Outdated
DataType::Table(maybe_table_column_defs) => match maybe_table_column_defs { | ||
Some(table_column_defs) => { | ||
if table_column_defs.is_empty() { | ||
parser_err!( | ||
"Expected table column definitions after TABLE keyword", | ||
p.peek_token().span.start | ||
)? | ||
} | ||
|
||
table_column_defs | ||
} | ||
None => parser_err!( | ||
"Expected table column definitions after TABLE keyword", | ||
p.peek_token().span.start | ||
)?, | ||
}, | ||
_ => parser_err!( | ||
"Expected table data type after TABLE keyword", | ||
p.peek_token().span.start | ||
)?, | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DataType::Table(maybe_table_column_defs) => match maybe_table_column_defs { | |
Some(table_column_defs) => { | |
if table_column_defs.is_empty() { | |
parser_err!( | |
"Expected table column definitions after TABLE keyword", | |
p.peek_token().span.start | |
)? | |
} | |
table_column_defs | |
} | |
None => parser_err!( | |
"Expected table column definitions after TABLE keyword", | |
p.peek_token().span.start | |
)?, | |
}, | |
_ => parser_err!( | |
"Expected table data type after TABLE keyword", | |
p.peek_token().span.start | |
)?, | |
}; | |
DataType::Table(Some(table_column_defs)) !if table_column_defs.is_empty() => table_column_defs, | |
_ => parser_err!( | |
"Expected table data type after TABLE keyword", | |
p.peek_token().span.start | |
)?, | |
}; |
looks like this condition can be simplified?
Also could we add a negative test to assert the behavior on invalid data_types, noticed it seems to be lacking coverage in the PR tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I consolidated this per your suggestion, and I added a test to assert a parser error for an incorrect table definition 👍
parser_err!( | ||
"Expected a subquery (or bare SELECT statement) after RETURN", | ||
self.peek_token().span.start | ||
)? | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we add a test for this behavior? (e.g. one that passes a regular expression and the verifies that the parser rejects it)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done 👍
This PR adds support for table valued functions for SQL Server, both inline & multi statement functions. For reference, that's the B & C documentation here: https://learn.microsoft.com/en-us/sql/t-sql/statements/create-function-transact-sql?view=sql-server-ver16#b-create-an-inline-table-valued-function
Inline TVF's are defined with
AS RETURN
, so we have a newCreateFunctionBody::AsReturn
variant accordingly. Functions using "AS RETURN" don't have BEGIN/END, so that part of the parsing logic is now conditional. Additionally, the data type parser now supports "RETURNS TABLE" without a table definition.Multi statement TVF's use named table expressions, so a new
NamedTable
data type variant was added. I didn't see a great way to integrate this into the existing data type parser (especially without rewinding), so creating this data type happens inside the parse create function logic first by parsing the identifier, then parsing the table definition, then using those elements to produce a NamedTable.I also added a new test example for each of these scenarios.