-
Notifications
You must be signed in to change notification settings - Fork 1.3k
DOC improve documentation for ENN and variants #1024
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
glemaitre
merged 6 commits into
scikit-learn-contrib:master
from
solegalli:update_ENN_docstrings
Jul 11, 2023
Merged
Changes from 2 commits
Commits
Show all changes
6 commits
Select commit
Hold shift + click to select a range
0b92e8f
update docstrigs of ENN and variants
solegalli ad2e75b
final touches
solegalli 2e48da7
change verb
solegalli ef94a23
add link to class enn
solegalli 9d6c4d6
add link to class enn
solegalli d27c87a
shorten lines
solegalli File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,4 +1,4 @@ | ||
"""Class to perform under-sampling based on the edited nearest neighbour | ||
"""Classes to perform under-sampling based on the edited nearest neighbour | ||
method.""" | ||
|
||
# Authors: Guillaume Lemaitre <[email protected]> | ||
|
@@ -28,8 +28,9 @@ | |
class EditedNearestNeighbours(BaseCleaningSampler): | ||
"""Undersample based on the edited nearest neighbour method. | ||
|
||
This method will clean the database by removing samples close to the | ||
decision boundary. | ||
This method will clean the dataset by removing samples close to the | ||
decision boundary. It removes observations from the majority class or | ||
classes when any or most of its closest neighours are from a different class. | ||
|
||
Read more in the :ref:`User Guide <edited_nearest_neighbors>`. | ||
|
||
|
@@ -38,29 +39,31 @@ class EditedNearestNeighbours(BaseCleaningSampler): | |
{sampling_strategy} | ||
|
||
n_neighbors : int or object, default=3 | ||
If ``int``, size of the neighbourhood to consider to compute the | ||
nearest neighbors. If object, an estimator that inherits from | ||
:class:`~sklearn.neighbors.base.KNeighborsMixin` that will be used to | ||
find the nearest-neighbors. | ||
If ``int``, size of the neighbourhood to consider for the undersampling, i.e., | ||
if `n_neighbors=3`, a sample will be removed when any or most of its 3 closest | ||
neighbours are from a different class. If object, an estimator that inherits | ||
from :class:`~sklearn.neighbors.base.KNeighborsMixin` that will be used to | ||
find the nearest-neighbors. Note that if you want to examine the 3 closest | ||
neighbours of a sample for the undersampling, you need to pass a 4-KNN. | ||
|
||
kind_sel : {{'all', 'mode'}}, default='all' | ||
Strategy to use in order to exclude samples. | ||
Strategy to use to exclude samples. | ||
|
||
- If ``'all'``, all neighbours will have to agree with the samples of | ||
interest to not be excluded. | ||
- If ``'mode'``, the majority vote of the neighbours will be used in | ||
order to exclude a sample. | ||
- If ``'all'``, all neighbours should be of the same class of the examined | ||
sample for it not be excluded. | ||
- If ``'mode'``, most neighbours should be of the same class of the examined | ||
sample for it not be excluded. | ||
|
||
The strategy `"all"` will be less conservative than `'mode'`. Thus, | ||
more samples will be removed when `kind_sel="all"` generally. | ||
more samples will be removed when `kind_sel="all"`, generally. | ||
|
||
{n_jobs} | ||
|
||
Attributes | ||
---------- | ||
sampling_strategy_ : dict | ||
Dictionary containing the information to sample the dataset. The keys | ||
corresponds to the class labels from which to sample and the values | ||
correspond to the class labels from which to sample and the values | ||
are the number of samples to sample. | ||
|
||
nn_ : estimator object | ||
|
@@ -86,9 +89,9 @@ class EditedNearestNeighbours(BaseCleaningSampler): | |
-------- | ||
CondensedNearestNeighbour : Undersample by condensing samples. | ||
|
||
RepeatedEditedNearestNeighbours : Undersample by repeating ENN algorithm. | ||
RepeatedEditedNearestNeighbours : Undersample by repeating the ENN algorithm. | ||
|
||
AllKNN : Undersample using ENN and various number of neighbours. | ||
AllKNN : Undersample using ENN with varying neighbours. | ||
|
||
Notes | ||
----- | ||
|
@@ -197,7 +200,11 @@ def _more_tags(self): | |
class RepeatedEditedNearestNeighbours(BaseCleaningSampler): | ||
"""Undersample based on the repeated edited nearest neighbour method. | ||
|
||
This method will repeat several time the ENN algorithm. | ||
This method repeats the ENN algorithm several times. The repetitions | ||
solegalli marked this conversation as resolved.
Show resolved
Hide resolved
|
||
will stop when i) the maximum number of iterations is reached, or ii) no | ||
more observations are being removed, or iii) one of the majority classes | ||
becomes a minority class or iv) one of the majority classes disappears | ||
during undersampling. | ||
|
||
Read more in the :ref:`User Guide <edited_nearest_neighbors>`. | ||
|
||
|
@@ -206,33 +213,34 @@ class RepeatedEditedNearestNeighbours(BaseCleaningSampler): | |
{sampling_strategy} | ||
|
||
n_neighbors : int or object, default=3 | ||
If ``int``, size of the neighbourhood to consider to compute the | ||
nearest neighbors. If object, an estimator that inherits from | ||
:class:`~sklearn.neighbors.base.KNeighborsMixin` that will be used to | ||
find the nearest-neighbors. | ||
If ``int``, size of the neighbourhood to consider for the undersampling, i.e., | ||
if `n_neighbors=3`, a sample will be removed when any or most of its 3 closest | ||
neighbours are from a different class. If object, an estimator that inherits | ||
from :class:`~sklearn.neighbors.base.KNeighborsMixin` that will be used to | ||
find the nearest-neighbors. Note that if you want to examine the 3 closest | ||
neighbours of a sample for the undersampling, you need to pass a 4-KNN. | ||
|
||
max_iter : int, default=100 | ||
Maximum number of iterations of the edited nearest neighbours | ||
algorithm for a single run. | ||
Maximum number of iterations of the edited nearest neighbours. | ||
|
||
kind_sel : {{'all', 'mode'}}, default='all' | ||
Strategy to use in order to exclude samples. | ||
Strategy to use to exclude samples. | ||
|
||
- If ``'all'``, all neighbours will have to agree with the samples of | ||
interest to not be excluded. | ||
- If ``'mode'``, the majority vote of the neighbours will be used in | ||
order to exclude a sample. | ||
- If ``'all'``, all neighbours should be of the same class of the examined | ||
sample for it not be excluded. | ||
- If ``'mode'``, most neighbours should be of the same class of the examined | ||
sample for it not be excluded. | ||
|
||
The strategy `"all"` will be less conservative than `'mode'`. Thus, | ||
more samples will be removed when `kind_sel="all"` generally. | ||
more samples will be removed when `kind_sel="all"`, generally. | ||
|
||
{n_jobs} | ||
|
||
Attributes | ||
---------- | ||
sampling_strategy_ : dict | ||
Dictionary containing the information to sample the dataset. The keys | ||
corresponds to the class labels from which to sample and the values | ||
correspond to the class labels from which to sample and the values | ||
are the number of samples to sample. | ||
|
||
nn_ : estimator object | ||
|
@@ -269,7 +277,7 @@ class RepeatedEditedNearestNeighbours(BaseCleaningSampler): | |
|
||
EditedNearestNeighbours : Undersample by editing samples. | ||
|
||
AllKNN : Undersample using ENN and various number of neighbours. | ||
AllKNN : Undersample using ENN with varying neighbours. | ||
|
||
Notes | ||
----- | ||
|
@@ -413,8 +421,12 @@ def _more_tags(self): | |
class AllKNN(BaseCleaningSampler): | ||
"""Undersample based on the AllKNN method. | ||
|
||
This method will apply ENN several time and will vary the number of nearest | ||
neighbours. | ||
This method will apply ENN several times varying the number of nearest | ||
solegalli marked this conversation as resolved.
Show resolved
Hide resolved
|
||
neighbours at each round. It begins by examining 1 closest neighbour, and | ||
it incrases the neighbourhood by 1 at each round. | ||
|
||
The algorithm stops when the maximum number of neighbours are examined or | ||
when the majority class becomes the minority class, whichever comes first. | ||
|
||
Read more in the :ref:`User Guide <edited_nearest_neighbors>`. | ||
|
||
|
@@ -423,21 +435,23 @@ class AllKNN(BaseCleaningSampler): | |
{sampling_strategy} | ||
|
||
n_neighbors : int or estimator object, default=3 | ||
If ``int``, size of the neighbourhood to consider to compute the | ||
nearest neighbors. If object, an estimator that inherits from | ||
:class:`~sklearn.neighbors.base.KNeighborsMixin` that will be used to | ||
find the nearest-neighbors. By default, it will be a 3-NN. | ||
If ``int``, size of the maximum neighbourhood to examine for the undersampling. | ||
If `n_neighbors=3`, in the first iteration the algorithm will examine 1 closest | ||
neigbhour, in the second round 2, and in the final round 3. If object, an | ||
estimator that inherits from :class:`~sklearn.neighbors.base.KNeighborsMixin` | ||
that will be used to find the nearest-neighbors. Note that if you want to | ||
examine the 3 closest neighbours of a sample, you need to pass a 4-KNN. | ||
|
||
kind_sel : {{'all', 'mode'}}, default='all' | ||
Strategy to use in order to exclude samples. | ||
Strategy to use to exclude samples. | ||
|
||
- If ``'all'``, all neighbours will have to agree with the samples of | ||
interest to not be excluded. | ||
- If ``'mode'``, the majority vote of the neighbours will be used in | ||
order to exclude a sample. | ||
- If ``'all'``, all neighbours should be of the same class of the examined | ||
sample for it not be excluded. | ||
- If ``'mode'``, most neighbours should be of the same class of the examined | ||
sample for it not be excluded. | ||
|
||
The strategy `"all"` will be less conservative than `'mode'`. Thus, | ||
more samples will be removed when `kind_sel="all"` generally. | ||
more samples will be removed when `kind_sel="all"`, generally. | ||
|
||
allow_minority : bool, default=False | ||
If ``True``, it allows the majority classes to become the minority | ||
|
@@ -451,7 +465,7 @@ class without early stopping. | |
---------- | ||
sampling_strategy_ : dict | ||
Dictionary containing the information to sample the dataset. The keys | ||
corresponds to the class labels from which to sample and the values | ||
correspond to the class labels from which to sample and the values | ||
are the number of samples to sample. | ||
|
||
nn_ : estimator object | ||
|
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.