Skip to content

x/build: shard and scale the longtest SlowBots #37439

Closed
@bcmills

Description

@bcmills

I missed a Windows test failure in CL 220645 because I forgot to run it against the windows-amd64-longtest SlowBot. I forgot to run it against that SlowBot because I'm not in the habit of doing so.

I'm not in the habit of running that SlowBot because it is currently much too slow. To pick some relevant runs:

  • The first run on CL 220717 started at 5:07 PM and completed at 5:31 PM (24 minutes).
  • The second run on that CL started at 5:51 PM and completed at 6:33 PM (42 minutes).
  • The run on CL 220722 started at 5:48 PM and completed at 6:28 PM (40 minutes).

In contrast, a regular TryBot typically caps out around 10 minutes (#32632), and we consider runs that take longer than 20 minutes to be unacceptably slow (#36629, #36482).

Since there is nothing particularly special about the hardware needed to run the longtest builds (they're just large VMs), I think we should adjust the builder configuration to run the -longtest SlowBots with 4 or more shards each. That way, the end-to-end latency impact of adding one of these bots to a CL will be minimal, and we will not only have less of a disincentive to using them, but also have much faster feedback in order to inform revert-or-fix decisions when one breaks.

CC @golang/osp-team

Metadata

Metadata

Assignees

No one assigned

    Labels

    Buildersx/build issues (builders, bots, dashboards)FeatureRequestIssues asking for a new feature that does not need a proposal.FrozenDueToAgeNeedsFixThe path to resolution is known, but the work has not been done.ToolSpeed

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions