Skip to content

Allow relative_to with recursive=False (in-place on objects) #1824

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

alejoe91
Copy link
Member

@alejoe91 alejoe91 commented Jul 13, 2023

Might fix #1819

For very long chains, dumping with recursive=True is a performance killer. However, that was required to make paths relative.
This PRs refactors completely the recursive_path_finder function to allow to modify the paths in place on BaseExtractor objects. Note that this is only possible if copy=False and it raises an exception if copy=True

See: https://github.com/alejoe91/spikeinterface/blob/non-recursive-relative-paths/src/spikeinterface/core/core_tools.py#L825-L896

  • Also added a helper function dict_contains_extractor to check the "status" of a dictionary.
  • Removed a bunch of d = that do not play well with debugger..

@DradeAW can you test this?

@alejoe91
Copy link
Member Author

An option to make it compatible with copy=True, would be to clone and replace extractor objects in the dictionary...but one thing at a time

@alejoe91 alejoe91 added the core Changes to core module label Jul 14, 2023
return recursive_path_modifier(d, func, target="path", copy=True)
def _make_paths_relative(d, relative, copy=True) -> dict:
relative = Path(relative).absolute()
func = lambda p: os.path.relpath(p, start=relative)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

os.path is compatible with pathlib objects

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is an equivalent pathlib function as well:
https://docs.python.org/3/library/pathlib.html#pathlib.PurePath.relative_to

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't work if paths are not direct relatives (i.e. it doesn't find .. locations)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can attest to that: I had a similar problem a few months back, and the pathlib equivalent did not work, and I had to use os.path.relpath. (link to where I used it: https://github.com/BarbourLab/lussac/blob/9c71ed6ad1303460566bf46bacf22c25ee805fe3/src/lussac/utils/plotting.py#L26)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.
This is good to know. It is on a footnote on the link the I shared actually:

[PurePath.relative_to()](https://docs.python.org/3/library/pathlib.html#pathlib.PurePath.relative_to) requires self to be the subpath of the argument, but [os.path.relpath()](https://docs.python.org/3/library/os.path.html#os.path.relpath) does not.

Double thanks, guys. I learn a couple of things about paths today. This is an enhancement that will be added in python 3:12, see the argument walk_up in the docs over there:

https://docs.python.org/3.12/library/pathlib.html#pathlib.PurePath.relative_to

The corresponding PR:
python/cpython#19813

if copy:
dc = deepcopy(d)
if dict_contains_extractors(dictionary):
raise Exception("The copy argument is only available if the input dictionary does not contain objects")
Copy link
Collaborator

@h-mayorquin h-mayorquin Jul 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
raise Exception("The copy argument is only available if the input dictionary does not contain objects")
raise Exception("The copy argument is only available if the input dictionary does not contain extractor objects")

?

Comment on lines +333 to +334
skip_recursive_path_modifier_warning: bool
If True, skip the warning that is raised when `recursive=True` and `relative_to` is not None.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In which situation we would have recursive=True and relative_to not None ?
I think this option is too much detail and to_dict() is more or less internal.
So I think that either we are strict and we raise an error either we do not care but adding this flag is quite a bit heavy for the API no ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as you said, to_dict is internal so we can keep it in my opinion

Comment on lines -379 to -383
try:
dump_dict["version"] = imported_module.__version__
except AttributeError:
dump_dict["version"] = "unknown"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it moved elsewhere ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assert relative_to.is_dir(), "'relative_to' must be an existing directory"
dump_dict = _make_paths_relative(dump_dict, relative_to)
copy = False if dict_contains_extractors(dump_dict) else True
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure to follow the logic here. Are we making an inplace modification of extarctors._kwargs ?
If this is the case I think this is not a good idea.
rec.to_dict() should not modify rec itself. Maybe I am missing something.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we are. that is the only way relative_to can work without recursive=True

Comment on lines +469 to +471
dictionary = self.to_dict(include_annotations=True, include_properties=True)
dictionary = deepcopy(dictionary)
clone = BaseExtractor.from_dict(dictionary)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer "d"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

d doesn't work with the debugger.

def dump_to_json(self, file_path: Union[str, Path, None] = None, relative_to=None, folder_metadata=None) -> None:
def dump_to_json(
self, file_path: Union[str, Path, None] = None, relative_to=None, folder_metadata=None, recursive=False
) -> None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not understand. I think that json should be always reccursive no ?
how could it be not reccurssive ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indeed it is, but it is handled by the SIJsonEncoder directly

Comment on lines +868 to +872
else:
if not skip_warning:
warnings.warn(
"The dictionary contains extractors, so the paths will be modified in-place!" "Use with caution"
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure to be agree with inplace modification. I think we should raise error or force a copy always.

@alejoe91
Copy link
Member Author

closing this in favor of a simpler patch

@alejoe91 alejoe91 closed this Jul 17, 2023
@alejoe91 alejoe91 deleted the non-recursive-relative-paths branch October 17, 2023 08:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Changes to core module
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Loading pipelines containing many nested sorting extractors are non-performant
4 participants