-
Notifications
You must be signed in to change notification settings - Fork 214
Full crabbake #1855
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Full crabbake #1855
Conversation
So, I tried to test this PR with the CLDR full dataset and I discovered that it generates files with 100k lines (pretty much obvious considering the size of the dataset). However, the massive files causes Rust Analyzer to freeze my whole system when loading the generated module. I don't think that's intentional, right? Maybe this could be solved with |
I ran
|
Ah that analyzer. You can add your generated files to excludeDirs. Does the checked in test data crash it as well? |
Yep, that's my solution for now. Unfortunately it has to be set for every IDE independently, since Rust Analyzer doesn't have a LSP-agnostic way to exclude files from a scan.
Nope, it works perfectly fine with a small subset of language ids! I'm theorizing that the problem appears because RA tries to parse the whole 100k lines of a file at the same time. Maybe this could be alleviated by splitting those files into smaller chunks? |
It would be nice if Rust Analyzer had a per-file annotation that we could add to each individual .rs file that gets generated. |
Yeah, I thought RA had some editor-agnostic way of skipping a file or function from being analyzed, but apparently they're still working on it. (rust-lang/rust-analyzer#7449, rust-lang/rust-analyzer#6113) |
Ah, apparently you can bypass IDEs using the That could also make it possible to separate the declaration of each |
It's a little bit hacky but not a bad idea since these are really intended to be treated more as data files. Is there a better extension than
|
Tested with I should point out that only a single expression or item is allowed inside the file, so if //! def.inc
{
static DATE: length::Date = length::Date::Long;
length::Bag::from_date_time_style(DATE, length::Time::Medium).into()
} //! main.rs
use icu::datetime::{mock::parse_gregorian_from_str, options::length, DateTimeFormat};
use icu::locid::locale;
mod icu4x;
fn main() {
let date = parse_gregorian_from_str("2020-10-14T13:21:00").unwrap();
let options = include!("def.inc");
let dtf = DateTimeFormat::try_new(locale!("pl"), &icu4x::BakedDataProvider, &options).unwrap();
let formatted_date = dtf.format(&date);
println!("📅: {}", formatted_date)
} |
I'm hesitant to change the extension, because then we will lose Rust features like syntax highlighting for small files as well. I also suspect that This sounds like a bug in Rust Analyzer for which there is a workaround, so I don't think it's something we need to fix. That said, I'm happy to include any required attributes; I see that |
If we use .inc, it's still possible to configure one's own personal IDE to do the syntax highlighting as Rust. It's just a matter of what the default should be, and I think it is not a bad idea for the default to be treating these files as generic text files. |
But then you might as well configure your IDE to not rust-analyze them. These are Rust files so they should be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay, done reviewing. There are parts of this we need to discuss.
@robertbastian Thank you for the advice! I could successfully integrate the module without causing RA to lag. Can't wait to see this PR merged! Also, do you have plans on making |
Does Rust 1.61 address some of the issues that we had from this PR such as a way to const-construct ZeroMap? |
Yes |
It'll still be somewhat ugly, but crucially the ugliness can be packed away into a method, rather than being splayed out in crabbake output |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please make the fields of zerovec types private again. Suggestion: make this a separate, standalone PR that adds const constructors for all the types.
Hooray! The files in the branch are the same across the force-push. 😃 ~ Your Friendly Jira-GitHub PR Checker Bot |
^squash |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I only reviewed the collator and normalizer parts. My (non-blocking) comment about alignment applies to all ZeroVec
backing stores when the ZeroVec
item is larger than one byte.
&'static <::icu_collator::provider::CollationDiacriticsV1Marker as DataMarker>::Yokeable; | ||
static UND: DataStruct = &::icu_collator::provider::CollationDiacriticsV1 { | ||
ce32s: unsafe { | ||
static DATA: &[u8] = &[ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Issue: (Can be a follow-up.) While I see that the decision in December means that making this &[u32]
without ZeroVec
in between is "optional" goal-wise, it seems to me that it would be relatively feasible to align this to a 4-byte boundary. ZeroVec
would still have to generate unaligned load instructions the access would be from an actually-aligned address. (Disclaimer: I haven't benchmarked unaligned ALU-type load instructions from actually-aligned addresses relative to unaligned addresses, but at least on the SIMD side it makes a difference, IIRC.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this would best be discussed on #1926. Having unaligned loads on crabbaked ZeroVecs is not ideal, but it's the obvious implementation. Maybe in the future we can add a BorrowedStatic
enum version that contains a &[T]
, but we have to consider the trade-off between code size and runtime gains.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My remaining open comments can be addressed in a follow-up PR if it's easier.
@@ -2,8 +2,8 @@ | |||
// called LICENSE at the top level of the ICU4X source tree | |||
// (online at: https://github.com/unicode-org/icu4x/blob/main/LICENSE ). | |||
|
|||
#[macro_use] | |||
mod macros; | |||
#![allow(clippy::exhaustive_structs)] // Field and FieldULE part of data struct |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit (optional): I prefer to have the exceptions explicit on each struct
// FIXME: This should be replaced with a custom derive. | ||
// See: https://github.com/unicode-org/icu4x/issues/1044 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: The FIXME points to an issue that is closed. Please resolve the FIXME or make it point to an open issue.
(I'm aware that this is copied from macros.rs; I wouldn't have noticed this if you hadn't moved the code to the new file)
@@ -84,6 +84,7 @@ mod error; | |||
pub mod maps; | |||
mod props; | |||
pub mod provider; | |||
#[allow(clippy::exhaustive_structs)] // TODO |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Add issues for TODOs.
Fixes #1290