Skip to content

[MLIR][LLVM] Recursion importer handle repeated self-references #87295

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
Apr 8, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
188 changes: 131 additions & 57 deletions mlir/lib/Target/LLVMIR/DebugImporter.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@
#include "mlir/IR/Location.h"
#include "llvm/ADT/STLExtras.h"
#include "llvm/ADT/ScopeExit.h"
#include "llvm/ADT/SetOperations.h"
#include "llvm/ADT/TypeSwitch.h"
#include "llvm/BinaryFormat/Dwarf.h"
#include "llvm/IR/Constants.h"
Expand All @@ -25,6 +26,10 @@ using namespace mlir;
using namespace mlir::LLVM;
using namespace mlir::LLVM::detail;

DebugImporter::DebugImporter(ModuleOp mlirModule)
: recursionPruner(mlirModule.getContext()),
context(mlirModule.getContext()), mlirModule(mlirModule) {}

Location DebugImporter::translateFuncLocation(llvm::Function *func) {
llvm::DISubprogram *subprogram = func->getSubprogram();
if (!subprogram)
Expand Down Expand Up @@ -246,42 +251,13 @@ DINodeAttr DebugImporter::translate(llvm::DINode *node) {
if (DINodeAttr attr = nodeToAttr.lookup(node))
return attr;

// If the node type is capable of being recursive, check if it's seen before.
auto recSelfCtor = getRecSelfConstructor(node);
if (recSelfCtor) {
// If a cyclic dependency is detected since the same node is being traversed
// twice, emit a recursive self type, and mark the duplicate node on the
// translationStack so it can emit a recursive decl type.
auto [iter, inserted] = translationStack.try_emplace(node, nullptr);
if (!inserted) {
// The original node may have already been assigned a recursive ID from
// a different self-reference. Use that if possible.
DistinctAttr recId = iter->second;
if (!recId) {
recId = DistinctAttr::create(UnitAttr::get(context));
iter->second = recId;
}
unboundRecursiveSelfRefs.back().insert(recId);
return cast<DINodeAttr>(recSelfCtor(recId));
}
}

unboundRecursiveSelfRefs.emplace_back();

auto guard = llvm::make_scope_exit([&]() {
if (recSelfCtor)
translationStack.pop_back();
// Register with the recursive translator. If it can be handled without
// recursing into it, return the result immediately.
if (DINodeAttr attr = recursionPruner.pruneOrPushTranslationStack(node))
return attr;

// Copy unboundRecursiveSelfRefs down to the previous level.
if (unboundRecursiveSelfRefs.size() == 1)
assert(unboundRecursiveSelfRefs.back().empty() &&
"internal error: unbound recursive self reference at top level.");
else
unboundRecursiveSelfRefs[unboundRecursiveSelfRefs.size() - 2].insert(
unboundRecursiveSelfRefs.back().begin(),
unboundRecursiveSelfRefs.back().end());
unboundRecursiveSelfRefs.pop_back();
});
auto guard = llvm::make_scope_exit(
[&]() { recursionPruner.popTranslationStack(node); });

// Convert the debug metadata if possible.
auto translateNode = [this](llvm::DINode *node) -> DINodeAttr {
Expand Down Expand Up @@ -318,22 +294,130 @@ DINodeAttr DebugImporter::translate(llvm::DINode *node) {
return nullptr;
};
if (DINodeAttr attr = translateNode(node)) {
// If this node was marked as recursive, set its recId.
if (auto recType = dyn_cast<DIRecursiveTypeAttrInterface>(attr)) {
if (DistinctAttr recId = translationStack.lookup(node)) {
attr = cast<DINodeAttr>(recType.withRecId(recId));
// Remove the unbound recursive ID from the set of unbound self
// references in the translation stack.
unboundRecursiveSelfRefs.back().erase(recId);
auto [result, isSelfContained] =
recursionPruner.finalizeTranslation(node, attr);
// Only cache fully self-contained nodes.
if (isSelfContained)
nodeToAttr.try_emplace(node, result);
return result;
}
return nullptr;
}

//===----------------------------------------------------------------------===//
// RecursionPruner
//===----------------------------------------------------------------------===//

/// Get the `getRecSelf` constructor for the translated type of `node` if its
/// translated DITypeAttr supports recursion. Otherwise, returns nullptr.
static function_ref<DIRecursiveTypeAttrInterface(DistinctAttr)>
getRecSelfConstructor(llvm::DINode *node) {
using CtorType = function_ref<DIRecursiveTypeAttrInterface(DistinctAttr)>;
return TypeSwitch<llvm::DINode *, CtorType>(node)
.Case([&](llvm::DICompositeType *) {
return CtorType(DICompositeTypeAttr::getRecSelf);
})
.Default(CtorType());
}

DINodeAttr DebugImporter::RecursionPruner::pruneOrPushTranslationStack(
llvm::DINode *node) {
// If the node type is capable of being recursive, check if it's seen
// before.
auto recSelfCtor = getRecSelfConstructor(node);
if (recSelfCtor) {
// If a cyclic dependency is detected since the same node is being
// traversed twice, emit a recursive self type, and mark the duplicate
// node on the translationStack so it can emit a recursive decl type.
auto [iter, inserted] = translationStack.try_emplace(node);
if (!inserted) {
// The original node may have already been assigned a recursive ID from
// a different self-reference. Use that if possible.
DIRecursiveTypeAttrInterface recSelf = iter->second.recSelf;
if (!recSelf) {
DistinctAttr recId = nodeToRecId.lookup(node);
if (!recId) {
recId = DistinctAttr::create(UnitAttr::get(context));
nodeToRecId[node] = recId;
}
recSelf = recSelfCtor(recId);
iter->second.recSelf = recSelf;
}
// Inject the self-ref into the previous layer.
translationStack.back().second.unboundSelfRefs.insert(recSelf);
return cast<DINodeAttr>(recSelf);
}
}

// Only cache fully self-contained nodes.
if (unboundRecursiveSelfRefs.back().empty())
nodeToAttr.try_emplace(node, attr);
return attr;
return lookup(node);
}

std::pair<DINodeAttr, bool>
DebugImporter::RecursionPruner::finalizeTranslation(llvm::DINode *node,
DINodeAttr result) {
// If `node` is not a potentially recursive type, it will not be on the
// translation stack. Nothing to set in this case.
if (translationStack.empty())
return {result, true};
if (translationStack.back().first != node)
return {result, translationStack.back().second.unboundSelfRefs.empty()};

TranslationState &state = translationStack.back().second;

// If this node is actually recursive, set the recId onto `result`.
if (DIRecursiveTypeAttrInterface recSelf = state.recSelf) {
auto recType = cast<DIRecursiveTypeAttrInterface>(result);
result = cast<DINodeAttr>(recType.withRecId(recSelf.getRecId()));
// Remove this recSelf from the set of unbound selfRefs.
state.unboundSelfRefs.erase(recSelf);
}
return nullptr;

// Insert the result into our internal cache if it's not self-contained.
if (!state.unboundSelfRefs.empty()) {
auto [_, inserted] = dependentCache.try_emplace(
node, DependentTranslation{result, state.unboundSelfRefs});
assert(inserted && "invalid state: caching the same DINode twice");
return {result, false};
}
return {result, true};
}

void DebugImporter::RecursionPruner::popTranslationStack(llvm::DINode *node) {
// If `node` is not a potentially recursive type, it will not be on the
// translation stack. Nothing to handle in this case.
if (translationStack.empty() || translationStack.back().first != node)
return;

// At the end of the stack, all unbound self-refs must be resolved already,
// and the entire cache should be accounted for.
TranslationState &currLayerState = translationStack.back().second;
if (translationStack.size() == 1) {
assert(currLayerState.unboundSelfRefs.empty() &&
"internal error: unbound recursive self reference at top level.");
translationStack.pop_back();
return;
}

// Copy unboundSelfRefs down to the previous level.
TranslationState &nextLayerState = (++translationStack.rbegin())->second;
nextLayerState.unboundSelfRefs.insert(currLayerState.unboundSelfRefs.begin(),
currLayerState.unboundSelfRefs.end());
translationStack.pop_back();
}

DINodeAttr DebugImporter::RecursionPruner::lookup(llvm::DINode *node) {
auto cacheIter = dependentCache.find(node);
if (cacheIter == dependentCache.end())
return {};

DependentTranslation &entry = cacheIter->second;
if (llvm::set_is_subset(entry.unboundSelfRefs,
translationStack.back().second.unboundSelfRefs))
return entry.attr;

// Stale cache entry.
dependentCache.erase(cacheIter);
return {};
}

//===----------------------------------------------------------------------===//
Expand Down Expand Up @@ -394,13 +478,3 @@ DistinctAttr DebugImporter::getOrCreateDistinctID(llvm::DINode *node) {
id = DistinctAttr::create(UnitAttr::get(context));
return id;
}

function_ref<DIRecursiveTypeAttrInterface(DistinctAttr)>
DebugImporter::getRecSelfConstructor(llvm::DINode *node) {
using CtorType = function_ref<DIRecursiveTypeAttrInterface(DistinctAttr)>;
return TypeSwitch<llvm::DINode *, CtorType>(node)
.Case([&](llvm::DICompositeType *concreteNode) {
return CtorType(DICompositeTypeAttr::getRecSelf);
})
.Default(CtorType());
}
105 changes: 91 additions & 14 deletions mlir/lib/Target/LLVMIR/DebugImporter.h
Original file line number Diff line number Diff line change
Expand Up @@ -29,8 +29,7 @@ namespace detail {

class DebugImporter {
public:
DebugImporter(ModuleOp mlirModule)
: context(mlirModule.getContext()), mlirModule(mlirModule) {}
DebugImporter(ModuleOp mlirModule);

/// Translates the given LLVM debug location to an MLIR location.
Location translateLoc(llvm::DILocation *loc);
Expand Down Expand Up @@ -86,24 +85,102 @@ class DebugImporter {
/// for it, or create a new one if not.
DistinctAttr getOrCreateDistinctID(llvm::DINode *node);

/// Get the `getRecSelf` constructor for the translated type of `node` if its
/// translated DITypeAttr supports recursion. Otherwise, returns nullptr.
function_ref<DIRecursiveTypeAttrInterface(DistinctAttr)>
getRecSelfConstructor(llvm::DINode *node);

/// A mapping between LLVM debug metadata and the corresponding attribute.
DenseMap<llvm::DINode *, DINodeAttr> nodeToAttr;
/// A mapping between distinct LLVM debug metadata nodes and the corresponding
/// distinct id attribute.
DenseMap<llvm::DINode *, DistinctAttr> nodeToDistinctAttr;

/// A stack that stores the metadata nodes that are being traversed. The stack
/// is used to detect cyclic dependencies during the metadata translation.
/// A node is pushed with a null value. If it is ever seen twice, it is given
/// a recursive id attribute, indicating that it is a recursive node.
llvm::MapVector<llvm::DINode *, DistinctAttr> translationStack;
/// All the unbound recursive self references in the translation stack.
SmallVector<DenseSet<DistinctAttr>> unboundRecursiveSelfRefs;
/// Translation helper for recursive DINodes.
/// Works alongside a stack-based DINode translator (the "main translator")
/// for gracefully handling DINodes that are recursive.
///
/// Usage:
/// - Before translating a node, call `pruneOrPushTranslationStack` to see if
/// the pruner can preempt this translation. If this is a node that the
/// pruner already knows how to handle, it will return the translated
/// DINodeAttr.
/// - After a node is successfully translated by the main translator, call
/// `finalizeTranslation` to save the translated result with the pruner, and
/// give it a chance to further modify the result.
/// - Regardless of success or failure by the main translator, always call
/// `popTranslationStack` at the end of translating a node. This is
/// necessary to keep the internal book-keeping in sync.
///
/// This helper maintains an internal cache so that no recursive type will
/// be translated more than once by the main translator.
/// This internal cache is different from the cache maintained by the main
/// translator because it may store nodes that are not self-contained (i.e.
/// contain unbounded recursive self-references).
class RecursionPruner {
public:
RecursionPruner(MLIRContext *context) : context(context) {}

/// If this node is a recursive instance that was previously seen, returns a
/// self-reference. If this node was previously cached, returns the cached
/// result. Otherwise, returns null attr, and a translation stack frame is
/// created for this node. Expects `finalizeTranslation` &
/// `popTranslationStack` to be called on this node later.
DINodeAttr pruneOrPushTranslationStack(llvm::DINode *node);

/// Register the translated result of `node`. Returns the finalized result
/// (with recId if recursive) and whether the result is self-contained
/// (i.e. contains no unbound self-refs).
std::pair<DINodeAttr, bool> finalizeTranslation(llvm::DINode *node,
DINodeAttr result);

/// Pop off a frame from the translation stack after a node is done being
/// translated.
void popTranslationStack(llvm::DINode *node);

private:
/// Returns the cached result (if exists) or null.
/// The cache entry will be removed if not all of its dependent self-refs
/// exists.
DINodeAttr lookup(llvm::DINode *node);

MLIRContext *context;

/// A cached translation that contains the translated attribute as well
/// as any unbound self-references that it depends on.
struct DependentTranslation {
/// The translated attr. May contain unbound self-references for other
/// recursive attrs.
DINodeAttr attr;
/// The set of unbound self-refs that this cached entry refers to. All
/// these self-refs must exist for the cached entry to be valid.
DenseSet<DIRecursiveTypeAttrInterface> unboundSelfRefs;
};
/// A mapping between LLVM debug metadata and the corresponding attribute.
/// Only contains those with unboundSelfRefs. Fully self-contained attrs
/// will be cached by the outer main translator.
DenseMap<llvm::DINode *, DependentTranslation> dependentCache;

/// Each potentially recursive node will have a TranslationState pushed onto
/// the `translationStack` to keep track of whether this node is actually
/// recursive (i.e. has self-references inside), and other book-keeping.
struct TranslationState {
/// The rec-self if this node is indeed a recursive node (i.e. another
/// instance of itself is seen while translating it). Null if this node
/// has not been seen again deeper in the translation stack.
DIRecursiveTypeAttrInterface recSelf;
/// All the unbound recursive self references in this layer of the
/// translation stack.
DenseSet<DIRecursiveTypeAttrInterface> unboundSelfRefs;
};
/// A stack that stores the metadata nodes that are being traversed. The
/// stack is used to handle cyclic dependencies during metadata translation.
/// Each node is pushed with an empty TranslationState. If it is ever seen
/// later when the stack is deeper, the node is recursive, and its
/// TranslationState is assigned a recSelf.
llvm::MapVector<llvm::DINode *, TranslationState> translationStack;

/// A mapping between DINodes that are recursive, and their assigned recId.
/// This is kept so that repeated occurrences of the same node can reuse the
/// same ID and be deduplicated.
DenseMap<llvm::DINode *, DistinctAttr> nodeToRecId;
};
RecursionPruner recursionPruner;

MLIRContext *context;
ModuleOp mlirModule;
Expand Down
13 changes: 5 additions & 8 deletions mlir/lib/Target/LLVMIR/DebugTranslation.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -216,18 +216,15 @@ DebugTranslation::translateImpl(DIGlobalVariableAttr attr) {
llvm::DIType *
DebugTranslation::translateRecursive(DIRecursiveTypeAttrInterface attr) {
DistinctAttr recursiveId = attr.getRecId();
if (attr.isRecSelf()) {
auto *iter = recursiveTypeMap.find(recursiveId);
assert(iter != recursiveTypeMap.end() && "unbound DI recursive self type");
if (auto *iter = recursiveTypeMap.find(recursiveId);
iter != recursiveTypeMap.end()) {
return iter->second;
} else {
assert(!attr.isRecSelf() && "unbound DI recursive self type");
}

auto setRecursivePlaceholder = [&](llvm::DIType *placeholder) {
[[maybe_unused]] auto [iter, inserted] =
recursiveTypeMap.try_emplace(recursiveId, placeholder);
(void)iter;
(void)inserted;
assert(inserted && "illegal reuse of recursive id");
recursiveTypeMap.try_emplace(recursiveId, placeholder);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We found some leak sanitizer issue with this locally :(. It looks like it can happen that the temporary remains in the IR somehow. My suspicion is that we need to overwrite the entry here now that we allow nested occurrences of the same recursive type?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

input.txt
Here is a reproducer of the leak (renamed to .txt due to github restrictions). When we run this with valgrind -- mlir-translate --mlir-to-llvm input.mlir, this shows that there is in fact a memory leak.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oof 🤦 thanks for the test case. I think the emplace here is guaranteed to insert because the check above will return if the recId is found in the map. I see that this use-after-free is the temporary that was created, and then RAUW'ed, but referenced again. I'm afraid we might have just exposed an existing issue with the exporter? Let me take a closer look 👀 ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if my testing is correct, this fixes the memory leak: #88122. Apparently we were just caching temporary nodes, causing them to be reused after RAUW.

};

llvm::DIType *result =
Expand Down
Loading