Skip to content

Coordination for structured error messages #24

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Mar 2, 2022
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
237 changes: 237 additions & 0 deletions proposals/000-error-messages.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,237 @@
## Introduction

This proposal seeks technical coordination from the Haskell Foundation
for improving the interop story between GHC and HLS. Once that is
done well, we might imagine taking our gained knowledge to improve
other interop around error messages.
While much of the work may be doable by volunteers, the HF would play
a role in harnessing and corralling the volunteers, as well as coordinating
common APIs between tools that are both easy to implement and easy to use.
The HF may also be instrumental in managing an error code namespace, shared
among all tooling central to Haskell.

## Background

Currently, there is no discipline around error messages. This lack of
structure manifests itself in a number of ways:

- HLS must parse the error messages that GHC
produces. This is fragile, wasteful, and hard to keep up-to-date. For
example, the HLS looks to see if a GHC extension name appears in an error
message, in order to allow the user to automatically enable it via a pragma.
But since `KindSignatures` is a substring of `StandaloneKindSignatures`, any
message mentioning the latter causes HLS to suggest both enabling `KindSignatures`
and `StandaloneKindSignatures` -- even though only `StandaloneKindSignatures`
would actually work. While there is a workaround here, we can see that
better communication between GHC and HLS would avoid this class of problem.

- Many GHC error messages refer to advanced concepts. This is unavoidable,
as Haskell has advanced features. However, telling a user that their rigid
type variable does not unify with a type because there is a kind mismatch
is utterly bewildering to Haskell learners. Applying structure to error messages
would allow for the creation of an error-message index that could explain
what the messages mean -- and how to fix the errors.

- Given that tools are increasingly working with one another and invoking
one another, it can be hard to know who exactly is producing an error
message. In one recent example that happened to me, I was trying to get
GHC to work with a GHC plugin, and I got a baffling error. It took me
more than an hour, if I recall, to discover that the problem was with
Haddock (I forget the details) and that I just needed to `--disable-documentation`.

There is already work in this area, within GHC. For the past few years, GHC
has slowly been converting its error messages to be encoded in data constructors,
not just as (fancy) strings. [This wiki page](https://gitlab.haskell.org/ghc/ghc/-/wikis/Errors-as-(structured)-values)
and [this blog post](https://well-typed.com/blog/2021/08/the-new-ghc-diagnostic-infrastructure/) describe
roughly the state of play. However, this work currently lacks a very important
ingredient: clients. That is, if GHC is exporting new, fancy datatypes encoding
its error messages, are these datatypes of use to HLS? We've reached out
to potential clients for feedback, but the best response we've gotten is something
along the lines of "sure, looks good". That's encouraging, but I would want to have
a little more coordination to make sure that the interface GHC is building is one
that can be easily consumed. The HF could help here by coordinating this
communication between projects.

The website describing error messages and error-generator identification
are both fresh in this proposal.

## Motivation

- It is better for tools to collaborate by passing structured data than
by sending strings back and forth. Structured error messages will thus
accelerate the development of powerful editor integrations and other
code analysis tools.

- Establishing a website describing error messages will make it a standard

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GHC has a nice website of documentation for users: the User's Guide. It's versioned, built nicely from source, and the source lives alongside GHC where the error definitions go. For that reason I think it would be a good proof-of-concept/low-hanging-fruit to start with a directory of GHC's error messages in the User's Guide. If that went well, we could think about how to re-present the same content in another place if that seemed useful.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are pluses and minuses to using the user manual. (I've added a note later in the document that the manual might be a good place.)

Pluses:

  • It seems easier to ensure that the documentation keeps up with the implementation.
  • It seems easier to ensure a high quality of documentation (via the normal PR review).
  • The user manual is already web established, with all of its infrastructure in place.

Minuses:

  • Higher barrier-of-entry to edit the manual, due to GHC's review process.
  • No easy way to just leave a comment. I would see the error-message resource as benefiting from user comments, providing extra examples and insights.
  • Hard to generalize beyond GHC (when that is ready).

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would have thought the source of truth would belong in a code repository -- either GHC itself or a dedicated repo. But of course it would be ideal if we could run a tool to extract docs for inclusion in the GHC manaual, which could be incorporated into the GHC manual build pipeline.

Would this not answer to everyone's needs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we may be agreeing. GHC's manual is in its code repository -- and in a format that can be extracted to the GHC manual website -- so that conforms to your suggestion. Or a separate website would have its content generated from some source repository ("a dedicated repo"), so that also conforms. But I think maybe there is something deeper you're suggesting, but I've missed. Have I captured your intent here?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought as much -- looks like we are all in violent agreement.

reference in the Haskell community and flatten the learning curve to
new Haskellers.

While the two main goals of this proposal (conversion of all error messages
to use datatypes; assigning error codes / creating a website) could be
considered separately, I think they make sense together in this proposal.
The second goal depends on the first, and it seems likely that many of
the same potential volunteers will be interested in both. That said, it would
be fine for, e.g. the HFTT to accept only one part of this proposal without
the other, or simply not to commit resources until a mid-way review were conducted.

## Goals

1. When compiling a program, HLS queries GHC for error messages and receives
structured errors, not strings. HLS can then use the information in these structures
to offer repairs to the user or other options.

1. All GHC error messages include a code. These codes can be searched for on a website
that explains the error message, with examples of what causes it and how the error
might be fixed.

1. Stretch goal: Building on the success of the HLS/GHC integration around error
messages, other central tooling adopts a similar approach. This would, for example,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The HLS/GHC integration is different in kind to, say HLS's integration with other tools. Many of the most difficult ones (mostly the build tools) are invoked as binaries, which makes passing structured errors more difficult. Of course, this could be overcome (many tools these days have a --json flag for giving structured output as JSON), but it's much easier when the interface is just a Haskell library interface.

enable the possibility that HLS can report more informative configuration errors
to users, or even to repair some of the problems itself.

1. Stretch goal: The HF would establish a global namespace for Haskell-tool
error message codes, where each tool is assigned (say) a prefix it should use
for any error codes. The HF would then encourage tools to use these prefixes
in error codes when presenting messages.

## What the Haskell Foundation Would Do

This section is meant to be suggestive of the concrete activity that would support
this proposal. It is possible the HFTT or other HF people would have an alternative
approach, which is fine, too.

1. Devote the time of an HF person, hereby called the Coordinator, to stay on top of
this project. The Coordinator could be an HF employee, an in-kind donation of labor,
or perhaps a dedicated and trustworthy volunteer.
I think it would be reasonable to timebox this work at 5 hours / week from
the Coordinator.

1. A key task of the Coordinator is to source volunteers to help with this initiative.
Accordingly, the Coordinator would be responsible for publicity around this plan, as well
as thinking creatively about ways to attract volunteers. For example, it might be a fun
idea to plan a virtual hackathon with potential volunteers or to reward contributions
with t-shirts. I would expect the Coordinator to think creatively about how to source
the volunteers. Volunteer management is a primary requirement of the Coordinator; it is
assumed that the Coordinator is managing volunteers in parallel with all other tasks here.

1. The Coordinator would start by getting an exact handle on the state of structured
error messages in GHC, by working with current contributors (e.g. Alfredo di Napoli, Sam
Derbyshire, Richard Eisenberg) and looking at the GHC source code. The Coordinator
would then identify an area within GHC that would be an appropriate next step to add
similar structured error messages and source volunteers to contribute to that area.
The Coordinator would help to shepherd any GHC MRs that would arise as part of this work.

1. In parallel with the previous item, the Coordinator would work with representatives
from the HLS team to figure out how HLS might take advantage of the structured error messages
GHC already has. Even if HLS is not ready to merge yet, the Coordinator and HLS would

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect that using the new structured errors will be straightforward. What will probably be non-straightforward and unpleasant is managing the compatibility between versions of GHC that don't have structured errors and those that do. Could be a good GSOC project.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's a good point. But maybe some design work within GHC will avoid this problem? I'm not sure. And we need to make sure that the errors are delivered in the right way. Maybe part of the reason we didn't get much feedback in the past is that it's obviously going to be easy.

work out a way to build a proof-of-concept based on the structured errors GHC already
has. This would validate the current API and increase the confidence in building on it.

1. Having established that the API is usable, the Coordinator would systematically work
through remaining error messages in GHC, directing volunteers to convert them to the
structured format.

1. As capacity is available, the Coordinator would also organize (or encourage a volunteer
to organize) a website where error messages could be explained. This might be a wiki,
or a git repository, or something exportable to e.g. readthedocs.io. This might even be
incorporated into the user manual. Figuring out a good
format would be the responsibility of the Coordinator, possibly by contacting stakeholders
with a survey or looking at other language communities.

1. The Coordinator would devise a scheme for assigning error code to messages. These might
be terse, inscrutable alphanumeric identifiers, or perhaps they would be human-readable.
The namespace would include the possibility of covering tools beyond just GHC, though
recursive hierarchy seems likely unnecessary. With the help of volunteers, the Coordinator
would add these error codes into the error-message API.

1. The Coordinator would continue to encourage volunteers to document error messages on
the error-message website, learning from early successes and failures.

1. If the project is going well and with community support, the Coordinator could look at
extending this idea to other tools. For example, perhaps Cabal or Stack could start to
deliver similar structured error messages -- with buy-in from those maintainers, of course.

## People

- **Performers:** The Coordinator, someone who will have time dedicated to this project. This person
would ideally be an HF employee or part of the portfolio of an in-kind donation of labor.

- **Reviewers:** The GHC team would review changes to GHC, while the HLS team would review changes to HLS.
The GHC and HLS teams would work together, coordinated by the Coordinator, to make an API that is useful
to both. Community volunteers would review the text of the website describing error messages. The Coordinator
would review the uptake of any website by examining analytics.

- **Stakeholders:** This would affect anyone who uses GHC, as the error codes would appear there. Key stakeholders
include the GHC and HLS maintainers, as well as educators, who would have access to Haskell learners who
would benefit from the results of this work.

## Resources

- The Coordinator would need to devote 5 hours / week.
- There would be a set of volunteers who would do much of the labor. If the volunteer pool runs low, the Coordinator
can do some of the technical work, as well.
- The GHC and HLS teams would have to devote some of their time to help support this initiative.

## Timeline

The timeline is highly dependent on the availability of volunteers to do the work. It thus seems
more sensible to timebox this effort at 5 hours of Coordination / week than to set a deadline for
completion. It would be sensible to review progress after 3 months to decide whether this project
is producing benefits (or is likely to soon).

## Lifecycle:

I don't think this really applies here. There would be a warm-up period at the beginning where the goal
is to source volunteers, but afterwards, it's all about keeping people moving forwards.

## Deliverables

1. A release of GHC where all of its error messages are structured.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is happening incrementally, right? Which version of GHC do they start to appear in?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

9.4 will be the first.


1. A release of HLS which consumes the structured error messages of GHC.

1. A website explaining at least 20 different errors produced by GHC. (More is better!)
I think we should set a modest goal of having 100 unique visitors to this website
over the course of a month.

1. A blog post (ideally written by the Coordinator) describing this process, as a way
of creating publicity for the HF.

## Outcomes

- With the structured interface to errors, tools such as HLS will be better equipped to
offer more power to users to manipulate and reason about code.

- The error-message cataloguing website will help Haskell learners (and, likely, some
old hands) understand error messages better.

## Risks

- It is currently unclear who would best serve as the Coordinator, which is why this
proposal leaves this role abstract. Accordingly, a risk is that there is no one suitable.
However, I still believe this proposal is worth considering (and perhaps approving) in this
state: it would then serve as a concrete task the HF could have when an appropriate
Coordinator arises. In the meantime, it could be used as an idea to show potential sponsors
who might want to know what initiatives the HF is considering or to use as part of a motivation
for expanding the HF employment base.

- Much of the work in this proposal is designed to be done by volunteers, working in parallel.
It is possible we will not find the right volunteers for this work. It is then possible
for the Coordinator to do more work themselves. In any case, trying to source volunteers for
this work could be an important learning experience in the lifetime of the HF, and it informs
the design of future initiatives.

- One risk is that the API being built around error messages is not useful to consumers.
This risk is intended to be mitigated by an early consultation with HLS.

- It is possible that the structured error messages will provide no opportunity for
improvement over the status quo. This is a risk the HFTT should consider. It might also
be worthwhile to reach out to HLS now to see what they think.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh no, it will be so much better than the awful error message parsing that's going on now. It's buggy, painful, unnecessarily complicated code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. But having not heard voices from HLS say this loudly, I didn't want to be over-presumptuous.


- It is possible that no one will find their way to the error-index website, or that

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another reason why the User's Guide is a good location. If we don't think users find it easy to find the User's Guide, then fixing that is great anyway!

the format chosen for the site will not resonate with users. The Coordinator would ideally
reach out to users to understand their needs better as the website is being designed
in order to mitigate this risk.

- It is possible that the extra structure will provide an obstacle to evolution within
GHC and slow development down there. I do not think this is likely, but it is conceivable.