Skip to content

Commit 438c19c

Browse files
committed
final edits
1 parent 3960305 commit 438c19c

File tree

3 files changed

+10
-7
lines changed

3 files changed

+10
-7
lines changed

doc/under_sampling.rst

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -319,11 +319,11 @@ iteratively decide if a sample should be removed or not
319319
3. Train a 1-KNN on `C`.
320320
4. Go through the samples in set :math:`S`, sample by sample, and classify each one
321321
using a 1 nearest neighbor rule (trained in 3).
322-
5. If the sample is misclassified, add it to :math:`C`, otherwise do nothing.
322+
5. If the sample is misclassified, add it to :math:`C`, and go to step 6.
323323
6. Repeat steps 3 to 5 until all observations in `S` have been examined.
324324

325325
The final dataset is `S`, containing all observations from the minority class and
326-
those from the majority that were miss-classified by the 1-KNN algorithms.
326+
those from the majority that were miss-classified by the successive 1-KNN algorithms.
327327

328328
The :class:`CondensedNearestNeighbour` can be used in the following manner::
329329

@@ -334,7 +334,7 @@ The :class:`CondensedNearestNeighbour` can be used in the following manner::
334334
[(0, 64), (1, 24), (2, 115)]
335335

336336
However, as illustrated in the figure below, :class:`CondensedNearestNeighbour`
337-
is sensitive to noise and will add noisy samples.
337+
is sensitive to noise and may select noisy samples.
338338

339339
In an attempt to remove noisy observations, :class:`OneSidedSelection`
340340
will first find the observations that are hard to classify, and then will use
@@ -345,8 +345,8 @@ will first find the observations that are hard to classify, and then will use
345345
2. Add a sample from the targeted class (class to be under-sampled) in
346346
:math:`C` and all other samples of this class in a set :math:`S`.
347347
3. Train a 1-KNN on `C`.
348-
4. Using a 1 nearest neighbor rule trained in 3, classify all samples
349-
in set :math:`S`.
348+
4. Using a 1 nearest neighbor rule trained in 3, classify all samples in
349+
set :math:`S`.
350350
5. Add all misclassified samples to :math:`C`.
351351
6. Remove Tomek Links from :math:`C`.
352352

imblearn/under_sampling/_prototype_selection/_condensed_nearest_neighbour.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -171,7 +171,7 @@ def _fit_resample(self, X, y):
171171
# Check each sample in S if we keep it or drop it
172172
for idx_sam, (x_sam, y_sam) in enumerate(zip(S_x, S_y)):
173173

174-
# Do not select sample which are already well classified
174+
# Do not select samples which are already well classified
175175
# (or were already selected -randomly- to be part of C)
176176
if idx_sam in good_classif_label:
177177
continue

imblearn/under_sampling/_prototype_selection/_tomek_links.py

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -111,12 +111,15 @@ def is_tomek(y, nn_index, class_type):
111111
class_excluded = [c for c in np.unique(y) if c not in class_type]
112112

113113
# there is a Tomek link between two samples if they are nearest
114-
# neighbors and from a different class.
114+
# neighbors of each other, and from a different class.
115115
for index_sample, target_sample in enumerate(y):
116116
if target_sample in class_excluded:
117117
continue
118118

119119
if y[nn_index[index_sample]] != target_sample:
120+
# corroborate that they are neighbours of each other:
121+
# (if A's closest neighbour is B, but B's closest neighbour
122+
# is C, then A and B are not a Tomek link)
120123
if nn_index[nn_index[index_sample]] == index_sample:
121124
links[index_sample] = True
122125

0 commit comments

Comments
 (0)