Skip to content

[clangd] Add includes from source to non-self-contained headers #72479

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

sr-tream
Copy link
Contributor

@sr-tream sr-tream commented Nov 16, 2023

It's a hack to solve include headers from source to non-self-contained headers

Limitations:

  • header file must be matched by isHeaderFile without language options
  • non-self-contained header must be included by source
  • AST for source must be cached before open non-self-contained header
  • resolved only includes, without macros and symbols from source file

Example:

// header

void hello(std::string Name);
// source
#include <string>

#include "header.h"

#include <iostream>

int main(){
  hello("World");
  return 0;
}

void hello(std::string Name){
  std::cout << "Hello, " << Name << "\n";
}

Current clangd behavior for header - show error Use of undeclared identifier 'std'

With this PR, clangd adds include to command line for header

Partially resolves clangd/clangd#45

@llvmbot llvmbot added the clangd label Nov 16, 2023
@llvmbot
Copy link
Member

llvmbot commented Nov 16, 2023

@llvm/pr-subscribers-clangd

Author: SR_team (sr-tream)

Changes

It's a hack to solve include headers from source to non-self-contained headers

Limitations:

  • header file must be matched by isHeaderFile without language options
  • non-self-contained header must be included by source
  • AST for source must be cached before open non-self-contained header
  • resolved only includes, without macros and symbols from source file

Example:

// header

void hello(std::string Name);
// source
#include &lt;string&gt;

#include "header.h"

#include &lt;iostream&gt;

int main(){
  hello("World");
  return 0;
}

void hello(std::string Name){
  std::cout &lt;&lt; "Hello, " &lt;&lt; Name &lt;&lt; "\n";
}

Current clangd behavior for header - show error Use of undeclared identifier 'std'

With this PR clangd add include <string> to command line for header


Full diff: https://github.com/llvm/llvm-project/pull/72479.diff

1 Files Affected:

  • (modified) clang-tools-extra/clangd/TUScheduler.cpp (+50-3)
diff --git a/clang-tools-extra/clangd/TUScheduler.cpp b/clang-tools-extra/clangd/TUScheduler.cpp
index 324ba1fc8cb8952..208a723d8b7ea8f 100644
--- a/clang-tools-extra/clangd/TUScheduler.cpp
+++ b/clang-tools-extra/clangd/TUScheduler.cpp
@@ -52,6 +52,7 @@
 #include "Config.h"
 #include "Diagnostics.h"
 #include "GlobalCompilationDatabase.h"
+#include "HeaderSourceSwitch.h"
 #include "ParsedAST.h"
 #include "Preamble.h"
 #include "clang-include-cleaner/Record.h"
@@ -64,6 +65,7 @@
 #include "support/Threading.h"
 #include "support/Trace.h"
 #include "clang/Basic/Stack.h"
+#include "clang/Driver/Driver.h"
 #include "clang/Frontend/CompilerInvocation.h"
 #include "clang/Tooling/CompilationDatabase.h"
 #include "llvm/ADT/FunctionExtras.h"
@@ -881,10 +883,30 @@ void ASTWorker::update(ParseInputs Inputs, WantDiagnostics WantDiags,
         }
       }
     }
-    if (Cmd)
+    if (!Cmd)
+      Cmd = CDB.getFallbackCommand(FileName);
+    if (Cmd) {
+      if (!Inputs.CompileCommand.CommandLine.empty()) {
+        auto PosArg =
+            std::find(Cmd->CommandLine.begin(), Cmd->CommandLine.end(), "--");
+        SmallVector<const char *, 16> TmpArgv;
+        for (const std::string &S : Cmd->CommandLine)
+          TmpArgv.push_back(S.c_str());
+        auto ClangCLMode =
+            !TmpArgv.empty() &&
+            driver::IsClangCL(driver::getDriverMode(
+                TmpArgv.front(), llvm::ArrayRef(TmpArgv).slice(1)));
+        for (auto &&Arg : Inputs.CompileCommand.CommandLine) {
+          if (ClangCLMode)
+            Arg.replace(0, 8, "/FI"); // 8 - strlen("-include")
+          if (PosArg == Cmd->CommandLine.end())
+            Cmd->CommandLine.push_back(std::move(Arg));
+          else
+            PosArg = std::next(Cmd->CommandLine.insert(PosArg, std::move(Arg)));
+        }
+      }
       Inputs.CompileCommand = std::move(*Cmd);
-    else
-      Inputs.CompileCommand = CDB.getFallbackCommand(FileName);
+    }
 
     bool InputsAreTheSame =
         std::tie(FileInputs.CompileCommand, FileInputs.Contents) ==
@@ -1684,6 +1706,31 @@ bool TUScheduler::update(PathRef File, ParseInputs Inputs,
     ContentChanged = true;
     FD->Contents = Inputs.Contents;
   }
+  if (isHeaderFile(File)) {
+    std::string SourceFile = HeaderIncluders->get(File);
+    if (SourceFile.empty()) {
+      auto VFS = Inputs.TFS->view(Inputs.CompileCommand.Directory);
+      if (auto CorrespondingFile =
+              clang::clangd::getCorrespondingHeaderOrSource(File,
+                                                            std::move(VFS)))
+        SourceFile = *CorrespondingFile;
+    }
+    if (!SourceFile.empty()) {
+      std::unique_ptr<FileData> &FD = Files[SourceFile];
+      if (FD && FD->Worker->isASTCached()) {
+        if (auto AST = IdleASTs->take(FD->Worker.lock().get())) {
+          auto &Headers = AST.value()->getIncludeStructure().MainFileIncludes;
+          for (const auto &H : Headers) {
+            if (H.Resolved == File)
+              break;
+            auto CmdLine = "-include" + H.Resolved;
+            Inputs.CompileCommand.CommandLine.push_back(std::move(CmdLine));
+          }
+          IdleASTs->put(FD->Worker.lock().get(), std::move(AST.value()));
+        }
+      }
+    }
+  }
   FD->Worker->update(std::move(Inputs), WantDiags, ContentChanged);
   // There might be synthetic update requests, don't change the LastActiveFile
   // in such cases.

@sr-tream sr-tream force-pushed the non-self-contained-headers branch 5 times, most recently from 76bb83a to bd6a6d0 Compare November 17, 2023 20:14
Copy link

github-actions bot commented Nov 17, 2023

⚠️ C/C++ code formatter, clang-format found issues in your code. ⚠️

You can test this locally with the following command:
git-clang-format --diff 6da4ecdf9285225ccc8fa4441b7e9f65e8f4f49c 35456a46409862a31014161978e56f9ad184a7f1 -- clang-tools-extra/clangd/TUScheduler.cpp
View the diff from clang-format here.
diff --git a/clang-tools-extra/clangd/TUScheduler.cpp b/clang-tools-extra/clangd/TUScheduler.cpp
index df9b5e715a..fa4238ff75 100644
--- a/clang-tools-extra/clangd/TUScheduler.cpp
+++ b/clang-tools-extra/clangd/TUScheduler.cpp
@@ -624,7 +624,8 @@ public:
          const TUScheduler::Options &Opts, ParsingCallbacks &Callbacks);
   ~ASTWorker();
 
-  void update(ParseInputs Inputs, WantDiagnostics, bool ContentChanged, std::optional<std::vector<std::string>> SourceInclides);
+  void update(ParseInputs Inputs, WantDiagnostics, bool ContentChanged,
+              std::optional<std::vector<std::string>> SourceInclides);
   void
   runWithAST(llvm::StringRef Name,
              llvm::unique_function<void(llvm::Expected<InputsAndAST>)> Action,
@@ -859,7 +860,8 @@ ASTWorker::~ASTWorker() {
 }
 
 void ASTWorker::update(ParseInputs Inputs, WantDiagnostics WantDiags,
-                       bool ContentChanged, std::optional<std::vector<std::string>> SourceInclides) {
+                       bool ContentChanged,
+                       std::optional<std::vector<std::string>> SourceInclides) {
   llvm::StringLiteral TaskName = "Update";
   auto Task = [=]() mutable {
     // Get the actual command as `Inputs` does not have a command.
@@ -1732,7 +1734,8 @@ bool TUScheduler::update(PathRef File, ParseInputs Inputs,
       }
     }
   }
-  FD->Worker->update(std::move(Inputs), WantDiags, ContentChanged, SourceInclides);
+  FD->Worker->update(std::move(Inputs), WantDiags, ContentChanged,
+                     SourceInclides);
   // There might be synthetic update requests, don't change the LastActiveFile
   // in such cases.
   if (ContentChanged)

@sr-tream sr-tream force-pushed the non-self-contained-headers branch 3 times, most recently from 06eaf30 to c85ae98 Compare November 18, 2023 07:08
@sr-tream sr-tream force-pushed the non-self-contained-headers branch from c85ae98 to 35456a4 Compare November 18, 2023 07:21
@HighCommander4
Copy link
Collaborator

@sr-tream just so you know, every time you force-push a PR, github sends every subscriber an email notification

@sr-tream
Copy link
Contributor Author

@sr-tream just so you know, every time you force-push a PR, github sends every subscriber an email notification

Sorry for flood, I will try to avoid this from now on

@kadircet
Copy link
Member

hi! thanks for the interest @sr-tream but I am afraid this is likely to cause disruption in more cases than it might improve.

apart from technical details like the threading concerns and reliance on certain variants that we don't really have (eg order of includes); at a high level the idea of "finding a representative source file for the header and replicating the PP state" is hard to work in general.

The current approach of finding a candidate through HeaderIncluders will pick an arbitrary source file that transitively includes the header you're interested in. Hence you'll actually create a set of -include XX commands, that might (and probably will) recursively include the header itself. Also there's nothing detecting a failure in processing of the compilation unit, this is done without checking self-containedness (which is again hard to check) hence will definitely regress both performance and correctness for self-contained headers as well.

sorry for forgetting to hit send on friday, i see that you did some more iterations on the weekend. but i think it'd be better to discuss the general approach first, to see if that makes sense. not only to ensure non-self-contained headers can be handled properly rather than adding an edge case where things look like they're working (but they're not, hence resulting in more bug reports and hard-to-maintain fixes), but also to make sure they're not regressing anything for regular code paths.

@sr-tream
Copy link
Contributor Author

apart from technical details like the threading concerns and reliance on certain variants that we don't really have (eg order of includes); at a high level the idea of "finding a representative source file for the header and replicating the PP state" is hard to work in general.

The current approach of finding a candidate through HeaderIncluders will pick an arbitrary source file that transitively includes the header you're interested in. Hence you'll actually create a set of -include XX commands, that might (and probably will) recursively include the header itself. Also there's nothing detecting a failure in processing of the compilation unit, this is done without checking self-containedness (which is again hard to check) hence will definitely regress both performance and correctness for self-contained headers as well.

We can limit ourselves to only the paired source file, and cancel inclusions if the source file does not include a non-self-contained header directly. This will reduce the number of situations in which this hack works, but will avoid the problems you describe

@kadircet
Copy link
Member

We can limit ourselves to only the paired source file, and cancel inclusions if the source file does not include a non-self-contained header directly. This will reduce the number of situations in which this hack works, but will avoid the problems you describe

deducing the paired source file is also a heuristic, which will likely go wrong at times. more importantly it'll still regress the behavior for self-contained headers, even if we ignore that, it still creates this new illusion of working when the user is lucky, and will trigger extra maintenance load (that's IMHO not justified) by already thin stretched maintainers.

@sr-tream sr-tream closed this Nov 20, 2023
@HighCommander4
Copy link
Collaborator

HighCommander4 commented Nov 23, 2023

A somewhat narrow use case where this approach might be workable is unity builds (groups of source files batched together to be compiled as a single translation unit for build performance).

That's a situation where the non-self-contained files of interest (the individual source files that are included in such a batch) have precisely one includer, their incoming preprocessor state can be replicated precisely using -include flags, and the intervention can be limited to them (it wouldn't apply to header files).

Of course, the including file is not likely to be open in this case, so we'd want something like an include graph stored in the index (as proposed in clangd/clangd#123) to consult.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support non-self-contained files
4 participants