Skip to content

VATIN tweaks #316

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open

VATIN tweaks #316

wants to merge 4 commits into from

Conversation

unho
Copy link
Contributor

@unho unho commented Sep 6, 2022

No description provided.

@unho unho force-pushed the vatin-tweaks branch 2 times, most recently from 7a3409a to 63bcda8 Compare September 6, 2022 18:52
@arthurdejong
Copy link
Owner

It might be a better idea to provide a stdnum.tin module that transparently supports multiple international tax identification numbers instead of using the stdnum.vatin module for that. VAT numbers are often not general tax identification numbers and sometimes are only relevant for VAT and not for other taxes.

A problem is that it is pretty difficult to categorise all personal and company identification numbers into one scheme. For example sometimes you may want to validate that a number is any kind of identifier of sorts, sometimes you may want to ensure that it is assigned to a company in some country or instead identifies some individual and sometimes you may accept one number over another (e.g. a drivers license number may not be sufficient but a ID-card number might be). Companies may have various numbers (e.g. VAT number, income tax number, chamber of commerce number) all of which may be different and appropriate for different circumstances.

I'm also thinking about reviewing the existing module to check that the countries that currently provide a "vat" alias are countries that actually have a VAT-based tax system.

Ideas on clarifying the use cases are very welcome.

@unho
Copy link
Contributor Author

unho commented Oct 16, 2022

I do realize all of that. Please let me explain the rationale behind stdnum.vatin.

Time ago there was this Python package called vatnumber. This provided validation for VAT numbers, which in a lot of countries is the same as the TIN number. In fact for plenty of countries it relied on python-stdnum. vatnumber was being used by plenty of other software.

In my particular case I discovered it through Odoo (formerly OpenERP), exactly when I had to fix the validation for some country. At that moment I discovered that vatnumber was not being actively maintained anymore, so I couldn't contribute the fix. After some research I also noticed that python-stdnum was being maintained, and that not only already had some of the fixes I wanted to contribute, but that it also had validation for several other countries. At that moment I started contributing, and also creating tickets for expanding the functionality.

Some time after Odoo replaced vatnumber with python-stdnum. In fact its code tries to use any vat module located within any country package within python-stdnum (https://github.com/odoo/odoo/blob/16.0/addons/base_vat/models/res_partner.py#L99). I believe that in part this happens because in Odoo they called this the vat field, and in part they don't realize that in some countries the VAT number is a different one.

I in fact noticed this differentiation when I looked into the TIN for United States, where I found out that the don't have the concept of VAT number, and that their TIN number is not even "universal". To my dismay, the further I looked into other countries, the more difficult it became to differentiate which number to use. Then I stumbled upon https://en.wikipedia.org/wiki/VAT_identification_number in which they kinda mix these two concepts.

That happened more or less at the moment when I really wanted python-stdnum to actually provide a simple method where you can provide for validation a "VAT number", in the way Odoo stores it, and that vatnumber actually had. And since nobody kinda agreed on having a proper TIN number, or a VAT number, then I kinda acquiesced on using "VATIN". Thus stdnum.vatin.

So answering your question, sure we can add a stdnum.tin module. But what happens if any country doesn't have a proper TIN? And even if we go this route, we will kinda have to verify again that the number we had is the proper TIN, in order to untangle the modules and their aliases.

Alternatively we can undo some of the vat aliases and use something like vatin. This will certainly break Odoo's code, but is more semantic, and we don't have to worry so much about it being TIN or VAT number, like for cases like United States or maybe India.

Or try to convince those countries to use a single unified number as identifier for the companies/people/etc for all purposes. But I don't see that happening 😄. In fact even countries like Spain have a different company registration number, that in fact is nearly useless since we do use NIF for everything, but that it still exists.

@unho
Copy link
Contributor Author

unho commented Oct 16, 2022

Another thing @arthurdejong, whether we do keep this module unchanged, or it gets renamed, I still think this particular PR should be merged 😄

arthurdejong pushed a commit that referenced this pull request Nov 13, 2022
@arthurdejong
Copy link
Owner

arthurdejong commented Nov 13, 2022

Thanks for providing the detailed response. I will really have to give this some more thought into how to get to some kind of longer-term solution. For now I would think it would be reasonable for any number that is listed on https://en.wikipedia.org/wiki/VAT_identification_number or there is any other source that says the number is used for VAT purposes in the country to be accepted by the vatin module.

Btw, I merged one of the commits in this PR as 7348c7a. I'll have to look at the others in more detail (sorry it is taking so long). I'll have to check but I thought Python's regex module automatically kept a cache of compiled regexes and I'm not too happy with the validate() function returning exceptions that are not a subclass of ValidationError because that is the documented API (see https://arthurdejong.org/python-stdnum/doc/).

@unho unho mentioned this pull request Apr 30, 2023
EmberCraze added a commit to gigapay/python-stdnum that referenced this pull request Apr 10, 2025
* Switch from nose to pytest

Nose hasn't seen a release since 2015 and sadly doesn't work with Python
3.10.

See nose-devs/nose#1099

* Upgrade to CodeQL Action v2

https://github.blog/changelog/2022-04-27-code-scanning-deprecation-of-codeql-action-v1/

* Fix flake8 error

This stops using not as a function and hopefully also makes the logic
clearer.

* Upgrade GitHub Actions

Update checkout to v3 (no relevant changes) and setup-python to v4
(changes the names for pypy versions).

* Add support for Python 3.10

* Put long line flake8 ignores in files instead of globally

We have some long URLs in the code (mostly in docstrings) and wrapping
them does not improve readability (and is difficult in docstrings) so
the E501 ignore is now put inside each file instead of globally.

Closes arthurdejong/python-stdnum#302

* Fix small typo

Improper inflection of plurals.

Closes arthurdejong/python-stdnum#299

* Add Czech bank account numbers

Closes arthurdejong/python-stdnum#295
Closes arthurdejong/python-stdnum#296

* Use str.zfill() for padding leading zeros

* Add extra court alias for german Handelsregisternummer

Charlottenburg (Berlin) is a valid court representation for Berlin
(Charlottenburg).

See https://www.northdata.com/VRB+Service+GmbH,+Berlin/Amtsgericht+Charlottenburg+%28Berlin%29+HRB+103587+B

Closes arthurdejong/python-stdnum#298

* Remove redundant steps with tox_job

This also switches the other Tox jobs to use the latest Python 3.x
interpreter.

Closes arthurdejong/python-stdnum#305

* Update ISIL download URL

* Provide a timeout to all download scripts

* Update names of Wikipedia pages with IMSI codes

* Ignore invalid downloaded country codes

The page currently lists a country without a country code (is listed as
"-"). This also ensures that lists of country codes are handled
consistently.

* Do not print trailing space

* Update database files

* Fix German OffeneRegister company registry URL

* Update EU VAT Vies test with new number

The number used before was apparently no longer valid.

* Add support for Tunisia TIN

Closes arthurdejong/python-stdnum#317
Closes arthurdejong/python-stdnum#309

* Add Kenyan TIN

Closes arthurdejong/python-stdnum#300
Closes arthurdejong/python-stdnum#310

* Add support for Morocco TIN

Closes arthurdejong/python-stdnum#226
Closes arthurdejong/python-stdnum#312

* Add Algerian NIF number

This currently only checks the length and whether it only contains
digits because little could be found on the structure of the number of
whether there are any check digits.

Closes arthurdejong/python-stdnum#313
Closes arthurdejong/python-stdnum#307

* Fix a couple typos found by codespell

Closes arthurdejong/python-stdnum#333

* Add North Macedonian ЕДБ

Note that this is implementation is mostly based on unofficial sources
describing the format, which match the hundreds of examples found
online.
https://forum.it.mk/threads/modularna-kontrola-na-embg-edb-dbs-itn.15663/?__cf_chl_tk=Op2PaEIauip6Z.ZjvhP897O8gRVAwe5CDAVTpjx1sEo-1663498930-0-gaNycGzNCRE#post-187048

Also note that the algorithm for the check digit was tested on all found
examples, and it doesn't work for all of them, despite those failing
examples don't seem to be valid according to the official online search.

Closes arthurdejong/python-stdnum#330
Closes arthurdejong/python-stdnum#222

* Add Faroe Islands V-number

Closes arthurdejong/python-stdnum#323
Closes arthurdejong/python-stdnum#219

* Add support for Montenegro TIN

Closes arthurdejong/python-stdnum#331
Closes arthurdejong/python-stdnum#223

* Add CAS Registry Number

* Add support for Ghana TIN

Closes arthurdejong/python-stdnum#326
Closes arthurdejong/python-stdnum#262

* Support running tests with PyPy 2.7

This also applies the fix from cfc80c8 from Python 2.7 to PyPy.

* Update Fødselsnummer test case for date in future

The future was now. This problem was pushed forwards to October 2039.

* Remove duplicate CAS Registry Number

The recently added stdnum.cas module was already available as teh
stdnum.casrn module.

Reverts acb6934

* Improve validation of CAS Registry Number

This ensures that a leading 0 is treated as invalid.

* Remove unused import

Fixes 09d595b

* Switch to parse_qs() from urllib.parse

The function was removed from the cgi module in Python 3.8.

* Switch to escape() from html

The function was removed from the cgi module in Python 3.8.

* Support "I" and "O" in CUSIP number

It is unclear why these letters were considered invalid at the time of
the implementation.

This also reduces the test set a bit while still covering most cases.

Closes arthurdejong/python-stdnum#337

* Add a check_uid() function to the stdnum.ch.uid module

This function can be used to performa a lookup of organisation
information by the Swiss Federal Statistical Office web service.

Related to arthurdejong/python-stdnum#336

* Make all exceptions inherit from ValueError

All the validation exceptions (subclasses of ValidationError) are raised
when a number is provided with an inappropriate value.

* Pad with zeroes in a more readable manner

Closes arthurdejong/python-stdnum#340

* Use HTTPS in URLs where possible

* Ensure we always run flake8-bugbear

This assumes that we no longer use Python 2.7 for running the flake8
tests any more.

* Add support for Slovenian EMŠO (Unique Master Citizen Number)

Closes arthurdejong/python-stdnum#338

* Add Pakistani ID card number

Based on the implementation provided by Quantum Novice (Syed Haseeb
Shah).

Closes arthurdejong/python-stdnum#306
Closes arthurdejong/python-stdnum#304

* vatin: Add a few more tests for is_valid

See arthurdejong/python-stdnum#316

* Pick up custom certificate from script path

This ensures that the script can be run from any directory.

Fixes c4ad714

* Increase timeout for CN Open Data download

It seems that raw.githubusercontent.com can be extremely slow.

* Update German OffeneRegister lookup data format

It appears that the data structure at OffeneRegister has changed which
requires a different query. Data is returned in a different structure.

* Update database files

* Get files ready for 1.18 release

* Avoid newer flake8

The new 6.0.0 contains a number of backwards incompatible changes
for which plugins need to be updated and configuration needs to be
updated.

Sadly the maintainer no longer accepts contributions or discussion
See PyCQA/flake8#1760

* Fix a typo

Clocses arthurdejong/python-stdnum#341

* Run Python 3.5 and 3.6 GitHub tests on older Ubuntu

The ubuntu-latest now points to ubuntu-22.04 instead of ubuntu-20.04
before.

This also switches the PyPy version to test with to 3.9.

* Fix typos found by codespell

Closes arthurdejong/python-stdnum#344

* Add initial CONTRIBUTING.md file

Initial description of the information needed for adding new number
formats and some coding and testing guidelines.

* Add support for Egypt TIN

This also convertis Arabic digits to ASCII digits.

Closes arthurdejong/python-stdnum#225
Closes arthurdejong/python-stdnum#334

* Extend number properties to show in online check

This also ensures that flake8 is run on the WSGI script.

* Fix typo in UEN docstring

* Fix Albanian tax number validation

This extends the description of the Albanian NIPT (NUIS) number with
information on the structure of the number. The first character was
previously limited between J and L but this letter indicates a decade
and the number is also used for individuals to where it indicates a
birth date.

Thanks Julien Launois for pointing this out.

Source: https://www.oecd.org/tax/automatic-exchange/crs-implementation-and-assistance/tax-identification-numbers/Albania-TIN.pdf

Fixes 3db826c
Closes arthurdejong/python-stdnum#402

* Update IBAN database file

Closes arthurdejong/python-stdnum#409

* Extend date parsing in GS1-128

Some new AIs have new date formats or have changed the way optional
components of formats are defined.

* Fix date formatting on PyPy 2.7

The original way of calling strftime was likely an artifact of Python
2.6 support.

Fixes 7e84c05

* Add support for Python 3.11

* Ensure flake8 is run on all Python files

This also fixes code style fixes in the Sphinx configuration file.

* Add get_county() function to Romanian CNP

This also validates the county part of the number.

Closes arthurdejong/python-stdnum#407

* Add functionality to get gender from Belgian National Number

This also extends the documentation for the number.

Closes https://github.com/arthurdejong/python-stdnum/pull/347/files

* Add support for Finland HETU new century indicating signs

More information at https://dvv.fi/en/reform-of-personal-identity-code

Cloess arthurdejong/python-stdnum#396

* Add Spanish postcode validator

Closes arthurdejong/python-stdnum#401

* Add support for Guinea TIN

Closes arthurdejong/python-stdnum#384
Closes arthurdejong/python-stdnum#386

* Add automated checking for correct license header

* Minor ISSN and ISBN documentation fixes

Fix a comment that claimed incorrect ISSN length and use slightly more
consistent terminology around check digits in ISSN and ISBN.

Closes arthurdejong/python-stdnum#415

* Handle (partially) unknown birthdate of Belgian National Number

This adds documentation for the special cases regarding birth dates
embedded in the number, allows for date parts to be unknown and adds
functions for getting the year and month.

Closes arthurdejong/python-stdnum#416

* Run Python 2.7 tests in a container for GitHub Actions

See actions/setup-python#672

* Add Belgian BIS Number

Closes arthurdejong/python-stdnum#418

* Validate first digit of Canadian SIN

See http://www.straightlineinternational.com/docs/vaildating_canadian_sin.pdf
See https://lists.arthurdejong.org/python-stdnum-users/2023/msg00000.html

* Fix file headers

This improves consistency across files and fixes some files that had an
incorrect file name reference.

* Extend license check to file header check

This also checks that the file name referenced in the file header is
correct.

* Add Slovenian Corporate Registration Number

Closes arthurdejong/python-stdnum#414

* Validate European VAT numbers with EU or IM prefix

Closes arthurdejong/python-stdnum#417

* Remove EU NACE update script

The website that publishes the NACE catalogue has changed and a complete
re-write of the script would be necessary. The data file hasn't changed
since 2017 so is also unlikely to change until it is going to be
replaced by NACE rev. 2.1 in 2025.

See https://ec.europa.eu/eurostat/web/nace

The NACE rev 2 specification can now be found here:
https://showvoc.op.europa.eu/#/datasets/ESTAT_Statistical_Classification_of_Economic_Activities_in_the_European_Community_Rev._2/data

The NACE rev 2.1 specification can now be found here:
https://showvoc.op.europa.eu/#/datasets/ESTAT_Statistical_Classification_of_Economic_Activities_in_the_European_Community_Rev._2.1._%28NACE_2.1%29/data

In both cases a ZIP file with RDF metadata can be downloaded (but the
web applciation also exposes some simpler JSON APIs).

* Update database files

This also modifies the OUI update script because the website has changed
to HTTPS and is sometimes very slow.

The Belgian Commerzbank no longer has a registration and a bank account
number in the tests used that bank.

* Replace test number for German company registry

The number seems to be no longer valid breaking the online tests.

* Update Belarusian UNP online check

The API for the online check for Belarusian UNP numbers at
https://www.portal.nalog.gov.by/grp/getData has changed some small
details of the API.

* Rename license_file option in setup.cfg

It seems the old option wasn't working with all versions of setuptools
anyway.

See https://setuptools.pypa.io/en/latest/userguide/declarative_config.html

* Avoid the deprecated assertRegexpMatches function

* Use importlib.resource in place of deprecated pkg_resources

Closes arthurdejong/python-stdnum#412
Closes arthurdejong/python-stdnum#413

* Remove obsolete intermediate certificate

The portal.nalog.gov.by web no longer has an incomplete certificate
chain.

* Ensure all files are included in source archive

Fixes b1dc313
Fixes 90044e2

* Get files ready for 1.19 release

* Add support for Python 3.12

* Fix typo (thanks Александр Кизеев)

* Ensure EU VAT numbers don't accept duplicate country codes

* Add British Columbia PHN

Closes arthurdejong/python-stdnum#421

* Add European Community (EC) Number

Closes arthurdejong/python-stdnum#422

* Fix vatin number compacting for "EU" VAT numbers

Thanks Davide Walder for finding this.

Closes arthurdejong/python-stdnum#427

* Imporve French NIF validation (checksum)

The last 3 digits are a checksum.  % 511
https://ec.europa.eu/taxation_customs/tin/specs/FS-TIN%20Algorithms-Public.docx

Closes arthurdejong/python-stdnum#426

* Fix Ukrainian EDRPOU check digit calculation

This fixes the case where the weighted sum woud be 10 which should
result in a check digit of 0.

Closes arthurdejong/python-stdnum#429

* Add Indian virtual identity number

Closes arthurdejong/python-stdnum#428

* Use HTTPS in URLs where possible

* Switch to using openpyxl for parsing XLSX files

The xlrd has dropped support for parsing XLSX files. We still use xlrd
for update/be_banks.py because they use the classic XLS format and
openpyxl does not support that format.

* Add update-dat tox target for convenient data file updating

* Update database files

The Belgian bpost bank no longer has a registration and a few bank
account numbers in the tests that used that bank were removed.

Also updates the update/gs1_ai.py script to handle the new format of the
data published by GS1. Also update the GS1-128 module to handle some
different date formats.

The Pakistan entry was kept in the stdnum/iban.dat file because the PDF
version of the IBAN Registry still contains the country.

fix db

* Get files ready for 1.20 release

* Drop support for Python 3.5

We don't have an easy way to test with Python 3.5 any more.

* Add support for Indonesian NIK

* Fix a typo

Closes arthurdejong/python-stdnum#443

* Update Irish PPS validator to support new numbers

See https://www.charteredaccountants.ie/News/b-range-pps-numbers

Closes arthurdejong/python-stdnum#440
Closes arthurdejong/python-stdnum#441

* Update Czech database files

Closes arthurdejong/python-stdnum#439
Closes arthurdejong/python-stdnum#435

* Adjust Swiss uid module to accept numbers without CHE prefix

Closes arthurdejong/python-stdnum#437
Closes arthurdejong/python-stdnum#423

* Support 16 digit Indonesian NPWP numbers

The Indonesian NPWP is being switched from 15 to 16 digits. The number
is now the NIK for Indonesian citizens and the old format with a leading
0 for others (organisations and non-citizens).

See https://www.grantthornton.co.id/insights/global-insights1/updates-regarding-the-format-of-indonesian-tax-id-numbers/

Closes arthurdejong/python-stdnum#432

* Replace use of deprecated inspect.getargspec()

Use the inspect.signature() function instead. The inspect.getargspec()
function was removed in Python 3.11.

* Add Belgian SSN number

Closes arthurdejong/python-stdnum#438

* Fix zeep client timeout parameter

The timeout parameter of the zeep transport class is not responsable for
POST/GET timeouts. The operational_timeout parameter should be used for
that.

See mvantellingen/python-zeep#140

Closes arthurdejong/python-stdnum#444
Closes arthurdejong/python-stdnum#445

* Customise certificate validation for web services

This adds a `verify` argument to all functions that use network services
for lookups. The option is used to configure how certificate validation
works, the same as in the requests library.

For SOAP requests this is implemented properly when using the Zeep
library. The implementations using Suds and PySimpleSOAP have been
updated on a best-effort basis but their use has been deprecated because
they do not seem to work in practice in a lot of cases already.

Related to arthurdejong/python-stdnum#452
Related to arthurdejong/python-stdnum#453

* Add Dutch identiteitskaartnummer

Closes arthurdejong/python-stdnum#449

* Add Belgian eID card number

Closes arthurdejong/python-stdnum#448

* Ensure get_soap_client() caches with verify

This fixes the get_soap_client() function to cache SOAP clients taking
the verify argument into account.

Fixes 3fcebb2

* Ignore deprecation warnings in flake8 target

This silences a ton of ast deprecation warnings that we can't fix in
python-stdnum anyway.

* Add more tests for Verhoeff implementation

See arthurdejong/python-stdnum#456

* Use older Github runner for Python 3.7 tests

* Add missing music industry ISRC country codes

Closes arthurdejong/python-stdnum#455
Closes arthurdejong/python-stdnum#454

* Allow Uruguay RUT number starting with 22

* Drop Python 2 support

This deprecates the stdnum.util.to_unicode() function because we no
longer have to deal with bytestrings.

* Add International Standard Name Identifier

Closes arthurdejong/python-stdnum#463

* Support Ecuador public RUC with juridical format

It seems that numbers with a format used for juridical RUCs have been
issued to companies.

Closes arthurdejong/python-stdnum#457

* Add Spanish CAE Number

Closes arthurdejong/python-stdnum#446

* Add Russian ОГРН

Closes arthurdejong/python-stdnum#459

* Add support for Python 3.13

* Fix Czech Rodné číslo check digit validation

It seems that a small minority of numbers assigned with a checksum of 10
are still valid and expected to have a check digit value of 0. According
to https://www.domzo13.cz/sw/evok/help/born_numbers.html this practice
even happended (but less frequently) after 1985.

Closes arthurdejong/python-stdnum#468

* Drop more Python 2.7 compatibility code

* Ignore test failures from www.dgii.gov.do

There was a change in the SOAP service and there is a new URL. However,
the API has changed and seems to require authentication.

We ignore test failures for now but unless a solution is found the DGII
validation will be removed.

See: arthurdejong/python-stdnum#462
See: arthurdejong/python-stdnum#461

---------

Co-authored-by: Arthur de Jong <[email protected]>
Co-authored-by: Christian Clauss <[email protected]>
Co-authored-by: vovavili <[email protected]>
Co-authored-by: petr.prikryl <[email protected]>
Co-authored-by: Romuald R <[email protected]>
Co-authored-by: Leandro Regueiro <[email protected]>
Co-authored-by: Dimitri Papadopoulos <[email protected]>
Co-authored-by: Blaž Bregar <[email protected]>
Co-authored-by: valeriko <[email protected]>
Co-authored-by: Ali-Akber Saifee <[email protected]>
Co-authored-by: RaduBorzea <[email protected]>
Co-authored-by: Jeff Horemans <[email protected]>
Co-authored-by: mjturt <[email protected]>
Co-authored-by: Victor <[email protected]>
Co-authored-by: Chales Horn <[email protected]>
Co-authored-by: Blaž Bregar <[email protected]>
Co-authored-by: Ömer Boratav <[email protected]>
Co-authored-by: Daniel Weber <[email protected]>
Co-authored-by: Kevin Dagostino <[email protected]>
Co-authored-by: Atul Deolekar <[email protected]>
Co-authored-by: vanderkoort <[email protected]>
Co-authored-by: Olly Middleton <[email protected]>
Co-authored-by: Joris Makauskis <[email protected]>
Co-authored-by: Victor Sordoillet <[email protected]>
Co-authored-by: Quique Porta <[email protected]>
Co-authored-by: nvmbrasserie <[email protected]>
@@ -44,7 +44,7 @@
>>> validate('XX')
Traceback (most recent call last):
...
InvalidComponent: ...
ImportError: ...
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The validate() function should only raise one of the exceptions listed in stdnum.exceptions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants