Skip to content

[BUG] ENN, RENN and AlKNN train KNN with different K from init param #851

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
solegalli opened this issue Aug 5, 2021 · 2 comments
Closed

Comments

@solegalli
Copy link
Contributor

solegalli commented Aug 5, 2021

Describe the bug

ENN and RENN accept integers in the init parameter n_neighbors, to indicate the number of neighbours for the KNN algorithm
The default is 3. However, a 4 KNN is trained instead.

Steps/Code to Reproduce

import pandas as pd
from sklearn.datasets import make_classification
from imblearn.under_sampling import EditedNearestNeighbours

def make_data(sep):
    
    # returns arrays
    X, y = make_classification(n_samples=1000,
                           n_features=2,
                           n_redundant=0,
                           n_clusters_per_class=1,
                           weights=[0.99],
                           class_sep=sep,# how separate the classes are
                           random_state=1)
    
    # trasform arrays into pandas df and series
    X = pd.DataFrame(X, columns =['varA', 'varB'])
    y = pd.Series(y)
    
    return X, y

X, y = make_data(sep=2)

enn = EditedNearestNeighbours(
    sampling_strategy='auto',  
    n_neighbors=3, **# I expect a 3 KNN**
    kind_sel='all', 
    n_jobs=4,
)  

X_resampled, y_resampled = enn.fit_resample(X, y)

enn.nn_
>>> NearestNeighbors(n_jobs=4, n_neighbors=4)

Expected Results

enn.nn_

NearestNeighbors(n_jobs=4, n_neighbors=3)

Actual Results

NearestNeighbors(n_jobs=4, n_neighbors=4)

@glemaitre
Copy link
Member

This is not a bug here. The thing is that a data sample is nearest neighbours of itself. Thus to respect the algorithm you need to add +1.

@chkoar is probably when explaining that we should not expose the internal in this case.

@solegalli
Copy link
Contributor Author

solegalli commented Aug 5, 2021

Yes, I realised that after some digging of the source code :_(.

I am updating the docstrings in #850. Should I try to clarify somehow?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants