final edits

solegalli · solegalli · commit 438c19c40c61 · 2021-08-11T15:23:28.000+02:00
diff --git a/doc/under_sampling.rst b/doc/under_sampling.rst
@@ -319,11 +319,11 @@ iteratively decide if a sample should be removed or not
 3. Train a 1-KNN on `C`.
 4. Go through the samples in set :math:`S`, sample by sample, and classify each one
    using a 1 nearest neighbor rule (trained in 3).
-5. If the sample is misclassified, add it to :math:`C`, otherwise do nothing.
+5. If the sample is misclassified, add it to :math:`C`, and go to step 6.
 6. Repeat steps 3 to 5 until all observations in `S` have been examined.
 
 The final dataset is `S`, containing all observations from the minority class and
-those from the majority that were miss-classified by the 1-KNN algorithms.
+those from the majority that were miss-classified by the successive 1-KNN algorithms.
 
 The :class:`CondensedNearestNeighbour` can be used in the following manner::
 
@@ -334,7 +334,7 @@ The :class:`CondensedNearestNeighbour` can be used in the following manner::
   [(0, 64), (1, 24), (2, 115)]
 
 However, as illustrated in the figure below, :class:`CondensedNearestNeighbour`
-is sensitive to noise and will add noisy samples.
+is sensitive to noise and may select noisy samples.
 
 In an attempt to remove noisy observations, :class:`OneSidedSelection`
 will first find the observations that are hard to classify, and then will use
@@ -345,8 +345,8 @@ will first find the observations that are hard to classify, and then will use
 2. Add a sample from the targeted class (class to be under-sampled) in
    :math:`C` and all other samples of this class in a set :math:`S`.
 3. Train a 1-KNN on `C`.
-4. Using a 1 nearest neighbor rule trained in 3, classify all samples
-in set :math:`S`.
+4. Using a 1 nearest neighbor rule trained in 3, classify all samples in
+   set :math:`S`.
 5. Add all misclassified samples to :math:`C`.
 6. Remove Tomek Links from :math:`C`.
 
diff --git a/imblearn/under_sampling/_prototype_selection/_condensed_nearest_neighbour.py b/imblearn/under_sampling/_prototype_selection/_condensed_nearest_neighbour.py
@@ -171,7 +171,7 @@ def _fit_resample(self, X, y):
                 # Check each sample in S if we keep it or drop it
                 for idx_sam, (x_sam, y_sam) in enumerate(zip(S_x, S_y)):
 
-                    # Do not select sample which are already well classified
+                    # Do not select samples which are already well classified
                     # (or were already selected -randomly- to be part of C)
                     if idx_sam in good_classif_label:
                         continue
diff --git a/imblearn/under_sampling/_prototype_selection/_tomek_links.py b/imblearn/under_sampling/_prototype_selection/_tomek_links.py
@@ -111,12 +111,15 @@ def is_tomek(y, nn_index, class_type):
         class_excluded = [c for c in np.unique(y) if c not in class_type]
 
         # there is a Tomek link between two samples if they are nearest
-        # neighbors and from a different class.
+        # neighbors of each other, and from a different class.
         for index_sample, target_sample in enumerate(y):
             if target_sample in class_excluded:
                 continue
 
             if y[nn_index[index_sample]] != target_sample:
+                # corroborate that they are neighbours of each other:
+                # (if A's closest neighbour is B, but B's closest neighbour
+                # is C, then A and B are not a Tomek link)
                 if nn_index[nn_index[index_sample]] == index_sample:
                     links[index_sample] = True