Skip to content
This repository was archived by the owner on Feb 2, 2024. It is now read-only.

Scale var/std #616

Merged
merged 1 commit into from
Feb 19, 2020
Merged

Scale var/std #616

merged 1 commit into from
Feb 19, 2020

Conversation

PokhodenkoSA
Copy link
Contributor

@PokhodenkoSA PokhodenkoSA commented Feb 17, 2020

Only var is implemented yet.
This PR is based on #610 because used parallel nanmean.

image

image

std autoscaleup because Series.std is implemented via Series.var. @densmirn thank you :)
image

@AlexanderKalistratov
Copy link
Collaborator

You could optimize it further. The formula for variance is:
(1/n)∑(aᵢ - mean)² = (1/n)∑(aᵢ² - 2aᵢ\*mean + mean²) = (1/n)(∑aᵢ² - 2\*mean\*∑aᵢ + mean²*∑1)
mean is:
mean = (1/n)*∑aᵢ
Which give us the following:
(1/n)(∑aᵢ² - (2/n)∑aᵢ*∑aᵢ + (1/n)²*∑aᵢ*∑aᵢ*(n)) = (1/n)(∑aᵢ² - (2/n)*∑aᵢ*∑aᵢ - (1/n)*∑aᵢ*∑aᵢ) = (1/n)(∑aᵢ² - (1/n)*(∑aᵢ)²)

So, you could implement variance as:

square_sum = 0.
sum = 0.
total_count = 0
for i in prange(len):
    a = self._data[i]
    if not isnan(a):
        square_sum += a*a
        sum += a
        total_count += 1

if total_count < 1:
    return numpy.nan

return (square_sum - sum*sum/total_count)/total_count

Also you could see covariance as an example

Also, haven't validate the final formula. So there could be errors

Add perf test for var with skipna=True

Add numpy_like var

Add numpy_like nanmean

Add test for numpy_like.nanvar

Add perf test for numpy_like.nanvar

Add perf test for Series.std(skipna=True)
@PokhodenkoSA
Copy link
Contributor Author

Also, haven't validate the final formula. So there could be errors

I will implement it in separate PR.

@PokhodenkoSA PokhodenkoSA merged commit a8e9d26 into IntelPython:master Feb 19, 2020
This was referenced Feb 19, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants