Skip to content

feat: Implement support for Github-based index, bypassing the registry #3023

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

Geod24
Copy link
Member

@Geod24 Geod24 commented Apr 28, 2025

This implement a package 'index' similar to that found for Homebrew, Nix, Cargo, etc... It allows us to remove a SPOF in our critical infrastructure, as a Github outage would always cause a registry being unusable anyway.

There are multiple steps to having a useful index:
- For transition purpose, we add a hidden command to Dub that export an `index.yaml`;
- In the future, users should register their packages by adding an entry to `index.yaml`, the index definition file of the registry. This is used as the source of all packages;
- `dub` now has a hidden `index-build` command to allow it to build the index based on an index definition file (`index.yaml`). Using this, it queries the various APIs to generate JSON index files that are stored under a pre-defined hierarchy.
- Finally, a `PackageSupplier` is added to make use of this new feature;

In the future, the registration process needs to be moved from the registry to Github to make this migration complete. This *can* be done by exposing a user-friendly interface on `code.dlang.org`, if making an MR to the index is deemed too complicated.

This is still a WIP, albeit quite complete now. Things that still need to be done:

  1. Description is not handled properly (needs to be extracted from the recipe file);
  2. Consider ways to limit / reduce impact on user's disk over a long period (currently uses 32 Mb of data);
  3. We need to have the index in production for a while before enabling it by default for users;
  4. Need to switch configy to a real JSON backend as the YAML one doesn't handle strings well.
  5. Consider scenario where the workspace is empty (e.g. in CI), do we always download a full cache ?
  6. Fetching packages from GitLab and Bitbucket is not yet implemented;
  7. Support for non-global instances of GitLab and Github could be trivially implemented;

FYI @s-ludwig

Copy link

github-actions bot commented Apr 28, 2025

✅ PR OK, no changes in deprecations or warnings

Total deprecations: 0

Total warnings: 0

Build statistics:

 statistics (-before, +after)
-executable size=5055872 bin/dub
-rough build time=61s
+executable size=5511744 bin/dub
+rough build time=65s
Full build output
DUB version 1.39.0, built on Mar 20 2025
LDC - the LLVM D compiler (1.40.1):
  based on DMD v2.110.0 and LLVM 19.1.7
  built with LDC - the LLVM D compiler (1.40.1)
  Default target: x86_64-unknown-linux-gnu
  Host CPU: znver3
  http://dlang.org - http://wiki.dlang.org/LDC


  Registered Targets:
    aarch64     - AArch64 (little endian)
    aarch64_32  - AArch64 (little endian ILP32)
    aarch64_be  - AArch64 (big endian)
    amdgcn      - AMD GCN GPUs
    arm         - ARM
    arm64       - ARM64 (little endian)
    arm64_32    - ARM64 (little endian ILP32)
    armeb       - ARM (big endian)
    avr         - Atmel AVR Microcontroller
    bpf         - BPF (host endian)
    bpfeb       - BPF (big endian)
    bpfel       - BPF (little endian)
    hexagon     - Hexagon
    lanai       - Lanai
    loongarch32 - 32-bit LoongArch
    loongarch64 - 64-bit LoongArch
    mips        - MIPS (32-bit big endian)
    mips64      - MIPS (64-bit big endian)
    mips64el    - MIPS (64-bit little endian)
    mipsel      - MIPS (32-bit little endian)
    msp430      - MSP430 [experimental]
    nvptx       - NVIDIA PTX 32-bit
    nvptx64     - NVIDIA PTX 64-bit
    ppc32       - PowerPC 32
    ppc32le     - PowerPC 32 LE
    ppc64       - PowerPC 64
    ppc64le     - PowerPC 64 LE
    r600        - AMD GPUs HD2XXX-HD6XXX
    riscv32     - 32-bit RISC-V
    riscv64     - 64-bit RISC-V
    sparc       - Sparc
    sparcel     - Sparc LE
    sparcv9     - Sparc V9
    spirv       - SPIR-V Logical
    spirv32     - SPIR-V 32-bit
    spirv64     - SPIR-V 64-bit
    systemz     - SystemZ
    thumb       - Thumb
    thumbeb     - Thumb (big endian)
    ve          - VE
    wasm32      - WebAssembly 32-bit
    wasm64      - WebAssembly 64-bit
    x86         - 32-bit X86: Pentium-Pro and above
    x86-64      - 64-bit X86: EM64T and AMD64
    xcore       - XCore
    xtensa      - Xtensa 32
   Upgrading project in /home/runner/work/dub/dub/
    Starting Performing "release" build using /opt/hostedtoolcache/dc/ldc2-1.40.1/x64/ldc2-1.40.1-linux-x86_64/bin/ldc2 for x86_64.
    Building dub 1.39.0-rc.1+commit.54.gcf379ca5: building configuration [application]
     Linking dub
STAT:statistics (-before, +after)
STAT:executable size=5511744 bin/dub
STAT:rough build time=65s

@Geod24
Copy link
Member Author

Geod24 commented Apr 28, 2025

This is what the output looks like for Configy:

 % cat index-build-result/co/yg/configy
{"version":0,"name":"configy","description":"An automatic YAML to struct configuration parser for dlang","source":{"kind":"github","owner":"dlang-community","project":"configy"},"versions":[{"version":"2.1.0","subs":[{"configurations":[{"dependencies":{"dyaml":">=0.8.4"},"name":""},{"name":"library"},{"name":"debug"},{"name":"unittest"}],"path":"dub.json","cache":{"etag":"W/\"f414b8246bf2fe14ae009a0952945d8d25c824d8\"","last_modified":"Thu, 10 Apr 2025 00:06:18 GMT"},"name":""}],"commit":"f161db12e7f6462959b9f42edd4301a252f13dfe"},{"version":"2.0.0","subs":[{"configurations":[{"dependencies":{"dyaml":">=0.8.4"},"name":""},{"name":"library"},{"name":"debug"},{"name":"unittest"}],"path":"dub.json","cache":{"etag":"W/\"f414b8246bf2fe14ae009a0952945d8d25c824d8\"","last_modified":"Thu, 10 Apr 2025 00:06:18 GMT"},"name":""}],"commit":"c66665417289da4e8f8ede16a96e8158efd499b5"},{"version":"1.0.0","subs":[{"configurations":[{"dependencies":{"dyaml":">=0.8.4"},"name":""},{"name":"library"},{"name":"debug"},{"name":"unittest"}],"path":"dub.json","cache":{"etag":"W/\"f414b8246bf2fe14ae009a0952945d8d25c824d8\"","last_modified":"Thu, 10 Apr 2025 00:06:18 GMT"},"name":""}],"commit":"110cc0600324f091773d979284d2948a9ddbb975"}],"cache":{"etag":"W/\"3d57862ee06488642331352dfd274351c5417a254c7c7f0523fab18fee8d9d36\"","last_modified":"Tue, 15 Apr 2025 15:51:57 GMT"}}

Or, pretty-printed:

{
  "version": 0,
  "name": "configy",
  "description": "An automatic YAML to struct configuration parser for dlang",
  "source": {
    "kind": "github",
    "owner": "dlang-community",
    "project": "configy"
  },
  "versions": [
    {
      "version": "2.1.0",
      "subs": [
        {
          "configurations": [
            {
              "dependencies": {
                "dyaml": ">=0.8.4"
              },
              "name": ""
            },
            {
              "name": "library"
            },
            {
              "name": "debug"
            },
            {
              "name": "unittest"
            }
          ],
          "path": "dub.json",
          "cache": {
            "etag": "W/\"f414b8246bf2fe14ae009a0952945d8d25c824d8\"",
            "last_modified": "Thu, 10 Apr 2025 00:06:18 GMT"
          },
          "name": ""
        }
      ],
      "commit": "f161db12e7f6462959b9f42edd4301a252f13dfe"
    },
    {
      "version": "2.0.0",
      "subs": [
        {
          "configurations": [
            {
              "dependencies": {
                "dyaml": ">=0.8.4"
              },
              "name": ""
            },
            {
              "name": "library"
            },
            {
              "name": "debug"
            },
            {
              "name": "unittest"
            }
          ],
          "path": "dub.json",
          "cache": {
            "etag": "W/\"f414b8246bf2fe14ae009a0952945d8d25c824d8\"",
            "last_modified": "Thu, 10 Apr 2025 00:06:18 GMT"
          },
          "name": ""
        }
      ],
      "commit": "c66665417289da4e8f8ede16a96e8158efd499b5"
    },
    {
      "version": "1.0.0",
      "subs": [
        {
          "configurations": [
            {
              "dependencies": {
                "dyaml": ">=0.8.4"
              },
              "name": ""
            },
            {
              "name": "library"
            },
            {
              "name": "debug"
            },
            {
              "name": "unittest"
            }
          ],
          "path": "dub.json",
          "cache": {
            "etag": "W/\"f414b8246bf2fe14ae009a0952945d8d25c824d8\"",
            "last_modified": "Thu, 10 Apr 2025 00:06:18 GMT"
          },
          "name": ""
        }
      ],
      "commit": "110cc0600324f091773d979284d2948a9ddbb975"
    }
  ],
  "cache": {
    "etag": "W/\"3d57862ee06488642331352dfd274351c5417a254c7c7f0523fab18fee8d9d36\"",
    "last_modified": "Tue, 15 Apr 2025 15:51:57 GMT"
  }
}

One way to reduce the bloat would be to have another step to only publish data that is relevant (currently the index stores all the etags / last modified to avoid needlessly querying Github). I would also like to look into package popularity (number of stars / forks, etc...).

@Geod24 Geod24 force-pushed the mlang/RegistryIndex branch 2 times, most recently from 1aa5d9f to 8e86aa7 Compare April 28, 2025 04:57
@Geod24
Copy link
Member Author

Geod24 commented Apr 28, 2025

I can see 103 dead packages:

     Warning The following packages errored out:
        - "dzmq"
        - "pc"
        - "civge"
        - "btreader"
        - "stripe-d"
        - "murmurhash3"
        - "libhell"
        - "interfacing"
        - "m3d"
        - "s3"
        - "d-leveldb-comparator"
        - "libco"
        - "gpgerror-d"
        - "gpgme-d"
        - "bgfx-d"
        - "llvm-d-2"
        - "ansi"
        - "bgfx-extras-d"
        - "liblzma"
        - "iupd"
        - "nukleard"
        - "nluad"
        - "imd"
        - "cdd"
        - "clipboard"
        - "libuid"
        - "soapclient"
        - "mogud-benchmark"
        - "gdal2"
        - "riffedit"
        - "tmarsteel-dpipe"
        - "quantum-random"
        - "yaml-d"
        - "dwtlib"
        - "rdub"
        - "parsed"
        - "gdub"
        - "nice-curses"
        - "dich"
        - "checkit"
        - "big-d"
        - "litecraft-bgfx"
        - "composer"
        - "os1"
        - "kisaragi"
        - "decimal"
        - "dfunkt"
        - "pterm"
        - "struct2mongo"
        - "vibedstruct2mongo"
        - "indexed-relation"
        - "mongo"
        - "rm-rf-exe"
        - "dconfig"
        - "gamenetworkingsockets_d"
        - "derelict-cufft"
        - "sanspam"
        - "ben-eater-8bit-emulator"
        - "discord-d"
        - "psychometry"
        - "firecracker_d"
        - "sml"
        - "evael"
        - "bindbc-assimp"
        - "dunex-auth"
        - "sbylib"
        - "repl-d"
        - "fmt-d"
        - "plist"
        - "grpc-d-core"
        - "grpc-d-interop"
        - "jar"
        - "nudge-d"
        - "webkit2gtkd"
        - "command"
        - "dstruct-orm"
        - "soundpipe-d"
        - "lhl"
        - "pa"
        - "erasure"
        - "jengine"
        - "libcbor"
        - "sweatyballs"
        - "dweb"
        - "feature"
        - "d2asm"
        - "dlsplus"
        - "option"
        - "cli-args"
        - "result"
        - "dlang_raylib"
        - "boxed"
        - "datefmt-redthing1"
        - "econf"
        - "dtiled-redthing1"
        - "faiss-d"
        - "mads"
        - "teacup"
        - "nullable-sugar"
        - "bert-d"
        - "flant5-d"
        - "hellodub"
        - "self"

@Geod24 Geod24 force-pushed the mlang/RegistryIndex branch from 8e86aa7 to e6ed130 Compare April 28, 2025 07:07
@Geod24 Geod24 force-pushed the mlang/RegistryIndex branch from e6ed130 to b48f09e Compare May 13, 2025 00:30
Geod24 added 2 commits May 13, 2025 02:53
This way we can parse things that are not YAML 1.1 compliant.
This implement a package 'index' similar to that found for Homebrew, Nix, Cargo, etc...
It allows us to remove a SPOF in our critical infrastructure, as a Github outage
would always cause a registry being unusable anyway.

There are multiple steps to having a useful index:
- For transition purpose, we add a hidden command to Dub that export an `index.yaml`;
- In the future, users should register their packages by adding an entry to `index.yaml`,
  the index definition file of the registry. This is used as the source of all packages;
- `dub` now has a hidden `index-build` command to allow it to build the index based
  on an index definition file (`index.yaml`). Using this, it queries the various APIs
  to generate JSON index files that are stored under a pre-defined hierarchy.
- Finally, a `PackageSupplier` is added to make use of this new feature;

In the future, the registration process needs to be moved from the registry to Github
to make this migration complete. This *can* be done by exposing a user-friendly interface
on `code.dlang.org`, if making an MR to the index is deemed too complicated.
@s-ludwig
Copy link
Member

While I'm not opposed to this approach in general, we should set the bar for this quite high:

  • Support for all currently supported platforms (GitHub, GitLab, Bitbucket, Gitea)
  • Support for private repositories
  • Don't regress in terms of usability (e.g. being able to register/verify packages through code.dlang.org)
  • Don't regress in terms of performance (e.g. a lengthy index update to pull in changes)
  • Don't lose additional registry features, such as download statistics
  • Review this in terms of the possibility of backing up package sources to avoid breakage when a package repository disappears

The thing I'm not quite sure about is what we gain by using GitHub to store the list of packages. If that's the only centrally served asset, that should also be trivial to do from a dlang server that is independent of the registry web frontend.

It should be mentioned that we already had a working fallback mechanism with <codemirror.dlang.org> et.al., but at some point along the way that obviously broke. We should definitely get that fixed again and maybe look into improving it (for example, skipping a server that timed out or yielded a 5xx error).

@Geod24
Copy link
Member Author

Geod24 commented May 22, 2025

Support for all currently supported platforms (GitHub, GitLab, Bitbucket, Gitea)

I missed Gitea. The rest are supported. Note that there is currently no public package using Gitea. But it shouldn't be hard to add.

Support for private repositories

I think there's multiple questions this raises. Do we want to have private repositories on the public index ? I'd say no. So it's more about supporting different repositories, which this should be able to do, but I haven't extensively tested it yet.

Don't regress in terms of usability (e.g. being able to register/verify packages through code.dlang.org)

Still need to do that, but definitely on the list.

Don't regress in terms of performance (e.g. a lengthy index update to pull in changes)

Agreed - also need to make sure we make it cache / CI friendly.

Don't lose additional registry features, such as download statistics

We could add various metrics to Dub, but also we could rely on Github's metrics / Stars / Forks.

Review this in terms of the possibility of backing up package sources to avoid breakage when a package repository disappears

I don't think this affects our ability to back up packages in any way, positively or negatively. However it makes ownership transfer much easier (because there's no longer a notion of ownership), and thus reviving a dead package no longer needs to involve an administrator.

The thing I'm not quite sure about is what we gain by using GitHub to store the list of packages. If that's the only centrally served asset, that should also be trivial to do from a dlang server that is independent of the registry web frontend.

A lot less to maintain, and a well-known access model.

It should be mentioned that we already had a working fallback mechanism with <codemirror.dlang.org> et.al., but at some point along the way that obviously broke. We should definitely get that fixed again and maybe look into improving it (for example, skipping a server that timed out or yielded a 5xx error).

Agreed we need to improve the client side of thing. But if we can remove most concerns on the server side, that'll be a win in terms of work to be done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants