1
- """Class to perform under-sampling based on the edited nearest neighbour
1
+ """Classes to perform under-sampling based on the edited nearest neighbour
2
2
method."""
3
3
4
4
# Authors: Guillaume Lemaitre <[email protected] >
28
28
class EditedNearestNeighbours (BaseCleaningSampler ):
29
29
"""Undersample based on the edited nearest neighbour method.
30
30
31
- This method will clean the database by removing samples close to the
32
- decision boundary.
31
+ This method cleans the dataset by removing samples close to the
32
+ decision boundary. It removes observations from the majority class or
33
+ classes when any or most of its closest neighours are from a different class.
33
34
34
35
Read more in the :ref:`User Guide <edited_nearest_neighbors>`.
35
36
@@ -38,29 +39,31 @@ class EditedNearestNeighbours(BaseCleaningSampler):
38
39
{sampling_strategy}
39
40
40
41
n_neighbors : int or object, default=3
41
- If ``int``, size of the neighbourhood to consider to compute the
42
- nearest neighbors. If object, an estimator that inherits from
43
- :class:`~sklearn.neighbors.base.KNeighborsMixin` that will be used to
44
- find the nearest-neighbors.
42
+ If ``int``, size of the neighbourhood to consider for the undersampling, i.e.,
43
+ if `n_neighbors=3`, a sample will be removed when any or most of its 3 closest
44
+ neighbours are from a different class. If object, an estimator that inherits
45
+ from :class:`~sklearn.neighbors.base.KNeighborsMixin` that will be used to
46
+ find the nearest-neighbors. Note that if you want to examine the 3 closest
47
+ neighbours of a sample for the undersampling, you need to pass a 4-KNN.
45
48
46
49
kind_sel : {{'all', 'mode'}}, default='all'
47
- Strategy to use in order to exclude samples.
50
+ Strategy to use to exclude samples.
48
51
49
- - If ``'all'``, all neighbours will have to agree with the samples of
50
- interest to not be excluded.
51
- - If ``'mode'``, the majority vote of the neighbours will be used in
52
- order to exclude a sample .
52
+ - If ``'all'``, all neighbours should be of the same class of the examined
53
+ sample for it not be excluded.
54
+ - If ``'mode'``, most neighbours should be of the same class of the examined
55
+ sample for it not be excluded .
53
56
54
57
The strategy `"all"` will be less conservative than `'mode'`. Thus,
55
- more samples will be removed when `kind_sel="all"` generally.
58
+ more samples will be removed when `kind_sel="all"`, generally.
56
59
57
60
{n_jobs}
58
61
59
62
Attributes
60
63
----------
61
64
sampling_strategy_ : dict
62
65
Dictionary containing the information to sample the dataset. The keys
63
- corresponds to the class labels from which to sample and the values
66
+ correspond to the class labels from which to sample and the values
64
67
are the number of samples to sample.
65
68
66
69
nn_ : estimator object
@@ -86,9 +89,9 @@ class EditedNearestNeighbours(BaseCleaningSampler):
86
89
--------
87
90
CondensedNearestNeighbour : Undersample by condensing samples.
88
91
89
- RepeatedEditedNearestNeighbours : Undersample by repeating ENN algorithm.
92
+ RepeatedEditedNearestNeighbours : Undersample by repeating the ENN algorithm.
90
93
91
- AllKNN : Undersample using ENN and various number of neighbours.
94
+ AllKNN : Undersample using ENN with varying neighbours.
92
95
93
96
Notes
94
97
-----
@@ -197,7 +200,11 @@ def _more_tags(self):
197
200
class RepeatedEditedNearestNeighbours (BaseCleaningSampler ):
198
201
"""Undersample based on the repeated edited nearest neighbour method.
199
202
200
- This method will repeat several time the ENN algorithm.
203
+ This method repeats the :class:`EditedNearestNeighbours` algorithm several times.
204
+ The repetitions will stop when i) the maximum number of iterations is reached,
205
+ or ii) no more observations are being removed, or iii) one of the majority classes
206
+ becomes a minority class or iv) one of the majority classes disappears
207
+ during undersampling.
201
208
202
209
Read more in the :ref:`User Guide <edited_nearest_neighbors>`.
203
210
@@ -206,33 +213,34 @@ class RepeatedEditedNearestNeighbours(BaseCleaningSampler):
206
213
{sampling_strategy}
207
214
208
215
n_neighbors : int or object, default=3
209
- If ``int``, size of the neighbourhood to consider to compute the
210
- nearest neighbors. If object, an estimator that inherits from
211
- :class:`~sklearn.neighbors.base.KNeighborsMixin` that will be used to
212
- find the nearest-neighbors.
216
+ If ``int``, size of the neighbourhood to consider for the undersampling, i.e.,
217
+ if `n_neighbors=3`, a sample will be removed when any or most of its 3 closest
218
+ neighbours are from a different class. If object, an estimator that inherits
219
+ from :class:`~sklearn.neighbors.base.KNeighborsMixin` that will be used to
220
+ find the nearest-neighbors. Note that if you want to examine the 3 closest
221
+ neighbours of a sample for the undersampling, you need to pass a 4-KNN.
213
222
214
223
max_iter : int, default=100
215
- Maximum number of iterations of the edited nearest neighbours
216
- algorithm for a single run.
224
+ Maximum number of iterations of the edited nearest neighbours.
217
225
218
226
kind_sel : {{'all', 'mode'}}, default='all'
219
- Strategy to use in order to exclude samples.
227
+ Strategy to use to exclude samples.
220
228
221
- - If ``'all'``, all neighbours will have to agree with the samples of
222
- interest to not be excluded.
223
- - If ``'mode'``, the majority vote of the neighbours will be used in
224
- order to exclude a sample .
229
+ - If ``'all'``, all neighbours should be of the same class of the examined
230
+ sample for it not be excluded.
231
+ - If ``'mode'``, most neighbours should be of the same class of the examined
232
+ sample for it not be excluded .
225
233
226
234
The strategy `"all"` will be less conservative than `'mode'`. Thus,
227
- more samples will be removed when `kind_sel="all"` generally.
235
+ more samples will be removed when `kind_sel="all"`, generally.
228
236
229
237
{n_jobs}
230
238
231
239
Attributes
232
240
----------
233
241
sampling_strategy_ : dict
234
242
Dictionary containing the information to sample the dataset. The keys
235
- corresponds to the class labels from which to sample and the values
243
+ correspond to the class labels from which to sample and the values
236
244
are the number of samples to sample.
237
245
238
246
nn_ : estimator object
@@ -269,7 +277,7 @@ class RepeatedEditedNearestNeighbours(BaseCleaningSampler):
269
277
270
278
EditedNearestNeighbours : Undersample by editing samples.
271
279
272
- AllKNN : Undersample using ENN and various number of neighbours.
280
+ AllKNN : Undersample using ENN with varying neighbours.
273
281
274
282
Notes
275
283
-----
@@ -413,8 +421,12 @@ def _more_tags(self):
413
421
class AllKNN (BaseCleaningSampler ):
414
422
"""Undersample based on the AllKNN method.
415
423
416
- This method will apply ENN several time and will vary the number of nearest
417
- neighbours.
424
+ This method will apply :class:`EditedNearestNeighbours` several times varying the
425
+ number of nearest neighbours at each round. It begins by examining 1 closest
426
+ neighbour, and it incrases the neighbourhood by 1 at each round.
427
+
428
+ The algorithm stops when the maximum number of neighbours are examined or
429
+ when the majority class becomes the minority class, whichever comes first.
418
430
419
431
Read more in the :ref:`User Guide <edited_nearest_neighbors>`.
420
432
@@ -423,21 +435,23 @@ class AllKNN(BaseCleaningSampler):
423
435
{sampling_strategy}
424
436
425
437
n_neighbors : int or estimator object, default=3
426
- If ``int``, size of the neighbourhood to consider to compute the
427
- nearest neighbors. If object, an estimator that inherits from
428
- :class:`~sklearn.neighbors.base.KNeighborsMixin` that will be used to
429
- find the nearest-neighbors. By default, it will be a 3-NN.
438
+ If ``int``, size of the maximum neighbourhood to examine for the undersampling.
439
+ If `n_neighbors=3`, in the first iteration the algorithm will examine 1 closest
440
+ neigbhour, in the second round 2, and in the final round 3. If object, an
441
+ estimator that inherits from :class:`~sklearn.neighbors.base.KNeighborsMixin`
442
+ that will be used to find the nearest-neighbors. Note that if you want to
443
+ examine the 3 closest neighbours of a sample, you need to pass a 4-KNN.
430
444
431
445
kind_sel : {{'all', 'mode'}}, default='all'
432
- Strategy to use in order to exclude samples.
446
+ Strategy to use to exclude samples.
433
447
434
- - If ``'all'``, all neighbours will have to agree with the samples of
435
- interest to not be excluded.
436
- - If ``'mode'``, the majority vote of the neighbours will be used in
437
- order to exclude a sample .
448
+ - If ``'all'``, all neighbours should be of the same class of the examined
449
+ sample for it not be excluded.
450
+ - If ``'mode'``, most neighbours should be of the same class of the examined
451
+ sample for it not be excluded .
438
452
439
453
The strategy `"all"` will be less conservative than `'mode'`. Thus,
440
- more samples will be removed when `kind_sel="all"` generally.
454
+ more samples will be removed when `kind_sel="all"`, generally.
441
455
442
456
allow_minority : bool, default=False
443
457
If ``True``, it allows the majority classes to become the minority
@@ -451,7 +465,7 @@ class without early stopping.
451
465
----------
452
466
sampling_strategy_ : dict
453
467
Dictionary containing the information to sample the dataset. The keys
454
- corresponds to the class labels from which to sample and the values
468
+ correspond to the class labels from which to sample and the values
455
469
are the number of samples to sample.
456
470
457
471
nn_ : estimator object
0 commit comments