Skip to content

Commit 9a59070

Browse files
solegalliglemaitre
andauthored
DOC improve documentation for ENN and variants (#1024)
Co-authored-by: Guillaume Lemaitre <[email protected]>
1 parent bcb675e commit 9a59070

File tree

1 file changed

+58
-44
lines changed

1 file changed

+58
-44
lines changed

imblearn/under_sampling/_prototype_selection/_edited_nearest_neighbours.py

Lines changed: 58 additions & 44 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
"""Class to perform under-sampling based on the edited nearest neighbour
1+
"""Classes to perform under-sampling based on the edited nearest neighbour
22
method."""
33

44
# Authors: Guillaume Lemaitre <[email protected]>
@@ -28,8 +28,9 @@
2828
class EditedNearestNeighbours(BaseCleaningSampler):
2929
"""Undersample based on the edited nearest neighbour method.
3030
31-
This method will clean the database by removing samples close to the
32-
decision boundary.
31+
This method cleans the dataset by removing samples close to the
32+
decision boundary. It removes observations from the majority class or
33+
classes when any or most of its closest neighours are from a different class.
3334
3435
Read more in the :ref:`User Guide <edited_nearest_neighbors>`.
3536
@@ -38,29 +39,31 @@ class EditedNearestNeighbours(BaseCleaningSampler):
3839
{sampling_strategy}
3940
4041
n_neighbors : int or object, default=3
41-
If ``int``, size of the neighbourhood to consider to compute the
42-
nearest neighbors. If object, an estimator that inherits from
43-
:class:`~sklearn.neighbors.base.KNeighborsMixin` that will be used to
44-
find the nearest-neighbors.
42+
If ``int``, size of the neighbourhood to consider for the undersampling, i.e.,
43+
if `n_neighbors=3`, a sample will be removed when any or most of its 3 closest
44+
neighbours are from a different class. If object, an estimator that inherits
45+
from :class:`~sklearn.neighbors.base.KNeighborsMixin` that will be used to
46+
find the nearest-neighbors. Note that if you want to examine the 3 closest
47+
neighbours of a sample for the undersampling, you need to pass a 4-KNN.
4548
4649
kind_sel : {{'all', 'mode'}}, default='all'
47-
Strategy to use in order to exclude samples.
50+
Strategy to use to exclude samples.
4851
49-
- If ``'all'``, all neighbours will have to agree with the samples of
50-
interest to not be excluded.
51-
- If ``'mode'``, the majority vote of the neighbours will be used in
52-
order to exclude a sample.
52+
- If ``'all'``, all neighbours should be of the same class of the examined
53+
sample for it not be excluded.
54+
- If ``'mode'``, most neighbours should be of the same class of the examined
55+
sample for it not be excluded.
5356
5457
The strategy `"all"` will be less conservative than `'mode'`. Thus,
55-
more samples will be removed when `kind_sel="all"` generally.
58+
more samples will be removed when `kind_sel="all"`, generally.
5659
5760
{n_jobs}
5861
5962
Attributes
6063
----------
6164
sampling_strategy_ : dict
6265
Dictionary containing the information to sample the dataset. The keys
63-
corresponds to the class labels from which to sample and the values
66+
correspond to the class labels from which to sample and the values
6467
are the number of samples to sample.
6568
6669
nn_ : estimator object
@@ -86,9 +89,9 @@ class EditedNearestNeighbours(BaseCleaningSampler):
8689
--------
8790
CondensedNearestNeighbour : Undersample by condensing samples.
8891
89-
RepeatedEditedNearestNeighbours : Undersample by repeating ENN algorithm.
92+
RepeatedEditedNearestNeighbours : Undersample by repeating the ENN algorithm.
9093
91-
AllKNN : Undersample using ENN and various number of neighbours.
94+
AllKNN : Undersample using ENN with varying neighbours.
9295
9396
Notes
9497
-----
@@ -197,7 +200,11 @@ def _more_tags(self):
197200
class RepeatedEditedNearestNeighbours(BaseCleaningSampler):
198201
"""Undersample based on the repeated edited nearest neighbour method.
199202
200-
This method will repeat several time the ENN algorithm.
203+
This method repeats the :class:`EditedNearestNeighbours` algorithm several times.
204+
The repetitions will stop when i) the maximum number of iterations is reached,
205+
or ii) no more observations are being removed, or iii) one of the majority classes
206+
becomes a minority class or iv) one of the majority classes disappears
207+
during undersampling.
201208
202209
Read more in the :ref:`User Guide <edited_nearest_neighbors>`.
203210
@@ -206,33 +213,34 @@ class RepeatedEditedNearestNeighbours(BaseCleaningSampler):
206213
{sampling_strategy}
207214
208215
n_neighbors : int or object, default=3
209-
If ``int``, size of the neighbourhood to consider to compute the
210-
nearest neighbors. If object, an estimator that inherits from
211-
:class:`~sklearn.neighbors.base.KNeighborsMixin` that will be used to
212-
find the nearest-neighbors.
216+
If ``int``, size of the neighbourhood to consider for the undersampling, i.e.,
217+
if `n_neighbors=3`, a sample will be removed when any or most of its 3 closest
218+
neighbours are from a different class. If object, an estimator that inherits
219+
from :class:`~sklearn.neighbors.base.KNeighborsMixin` that will be used to
220+
find the nearest-neighbors. Note that if you want to examine the 3 closest
221+
neighbours of a sample for the undersampling, you need to pass a 4-KNN.
213222
214223
max_iter : int, default=100
215-
Maximum number of iterations of the edited nearest neighbours
216-
algorithm for a single run.
224+
Maximum number of iterations of the edited nearest neighbours.
217225
218226
kind_sel : {{'all', 'mode'}}, default='all'
219-
Strategy to use in order to exclude samples.
227+
Strategy to use to exclude samples.
220228
221-
- If ``'all'``, all neighbours will have to agree with the samples of
222-
interest to not be excluded.
223-
- If ``'mode'``, the majority vote of the neighbours will be used in
224-
order to exclude a sample.
229+
- If ``'all'``, all neighbours should be of the same class of the examined
230+
sample for it not be excluded.
231+
- If ``'mode'``, most neighbours should be of the same class of the examined
232+
sample for it not be excluded.
225233
226234
The strategy `"all"` will be less conservative than `'mode'`. Thus,
227-
more samples will be removed when `kind_sel="all"` generally.
235+
more samples will be removed when `kind_sel="all"`, generally.
228236
229237
{n_jobs}
230238
231239
Attributes
232240
----------
233241
sampling_strategy_ : dict
234242
Dictionary containing the information to sample the dataset. The keys
235-
corresponds to the class labels from which to sample and the values
243+
correspond to the class labels from which to sample and the values
236244
are the number of samples to sample.
237245
238246
nn_ : estimator object
@@ -269,7 +277,7 @@ class RepeatedEditedNearestNeighbours(BaseCleaningSampler):
269277
270278
EditedNearestNeighbours : Undersample by editing samples.
271279
272-
AllKNN : Undersample using ENN and various number of neighbours.
280+
AllKNN : Undersample using ENN with varying neighbours.
273281
274282
Notes
275283
-----
@@ -413,8 +421,12 @@ def _more_tags(self):
413421
class AllKNN(BaseCleaningSampler):
414422
"""Undersample based on the AllKNN method.
415423
416-
This method will apply ENN several time and will vary the number of nearest
417-
neighbours.
424+
This method will apply :class:`EditedNearestNeighbours` several times varying the
425+
number of nearest neighbours at each round. It begins by examining 1 closest
426+
neighbour, and it incrases the neighbourhood by 1 at each round.
427+
428+
The algorithm stops when the maximum number of neighbours are examined or
429+
when the majority class becomes the minority class, whichever comes first.
418430
419431
Read more in the :ref:`User Guide <edited_nearest_neighbors>`.
420432
@@ -423,21 +435,23 @@ class AllKNN(BaseCleaningSampler):
423435
{sampling_strategy}
424436
425437
n_neighbors : int or estimator object, default=3
426-
If ``int``, size of the neighbourhood to consider to compute the
427-
nearest neighbors. If object, an estimator that inherits from
428-
:class:`~sklearn.neighbors.base.KNeighborsMixin` that will be used to
429-
find the nearest-neighbors. By default, it will be a 3-NN.
438+
If ``int``, size of the maximum neighbourhood to examine for the undersampling.
439+
If `n_neighbors=3`, in the first iteration the algorithm will examine 1 closest
440+
neigbhour, in the second round 2, and in the final round 3. If object, an
441+
estimator that inherits from :class:`~sklearn.neighbors.base.KNeighborsMixin`
442+
that will be used to find the nearest-neighbors. Note that if you want to
443+
examine the 3 closest neighbours of a sample, you need to pass a 4-KNN.
430444
431445
kind_sel : {{'all', 'mode'}}, default='all'
432-
Strategy to use in order to exclude samples.
446+
Strategy to use to exclude samples.
433447
434-
- If ``'all'``, all neighbours will have to agree with the samples of
435-
interest to not be excluded.
436-
- If ``'mode'``, the majority vote of the neighbours will be used in
437-
order to exclude a sample.
448+
- If ``'all'``, all neighbours should be of the same class of the examined
449+
sample for it not be excluded.
450+
- If ``'mode'``, most neighbours should be of the same class of the examined
451+
sample for it not be excluded.
438452
439453
The strategy `"all"` will be less conservative than `'mode'`. Thus,
440-
more samples will be removed when `kind_sel="all"` generally.
454+
more samples will be removed when `kind_sel="all"`, generally.
441455
442456
allow_minority : bool, default=False
443457
If ``True``, it allows the majority classes to become the minority
@@ -451,7 +465,7 @@ class without early stopping.
451465
----------
452466
sampling_strategy_ : dict
453467
Dictionary containing the information to sample the dataset. The keys
454-
corresponds to the class labels from which to sample and the values
468+
correspond to the class labels from which to sample and the values
455469
are the number of samples to sample.
456470
457471
nn_ : estimator object

0 commit comments

Comments
 (0)