-
Notifications
You must be signed in to change notification settings - Fork 162
Valid URLs failing validation - query and fragment parts #296
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hello, i just was in the mid of creating a bug report as well (thanks for raising this 👍) for previously valid URLs which are now reported as invalid after the update from 0.20.0 to 0.21.2 (still seen also after the update to 0.22.0 to include the fix from #289 for #288). These are the cases we're seeing as reported to be invalid while they are actually (to the best of my knowledge, haven't checked all of them) valid:
|
Here's the patch: �diff --git a/src/validators/url.py b/src/validators/url.py
index 16698b1..16259e7 100644
--- a/src/validators/url.py
+++ b/src/validators/url.py
@@ -3,7 +3,7 @@
# standard
from functools import lru_cache
import re
-from urllib.parse import unquote, urlsplit
+from urllib.parse import parse_qs, unquote, urlsplit
# local
from .hostname import hostname
@@ -34,11 +34,6 @@ def _path_regex():
)
-@lru_cache
-def _query_regex():
- return re.compile(r"&?(\w+=?[^\s&]*)", re.IGNORECASE)
-
-
def _validate_scheme(value: str):
"""Validate scheme."""
# More schemes will be considered later.
@@ -108,16 +103,23 @@ def _validate_netloc(
) and _validate_auth_segment(basic_auth)
-def _validate_optionals(path: str, query: str, fragment: str):
+def _validate_optionals(
+ path: str,
+ query: str,
+ fragment: str,
+ strict_query: bool = False
+):
"""Validate path query and fragments."""
optional_segments = True
if path:
optional_segments &= bool(_path_regex().match(path))
- if query:
- optional_segments &= bool(_query_regex().match(query))
+ if query and parse_qs(query, strict_parsing=strict_query):
+ optional_segments &= True
if fragment:
fragment = fragment.lstrip("/") if fragment.startswith("/") else fragment
- optional_segments &= all(char_to_avoid not in fragment for char_to_avoid in ("/", "?"))
+ optional_segments &= all(
+ char_to_avoid not in fragment for char_to_avoid in ("?",)
+ )
return optional_segments
@@ -130,6 +132,7 @@ def url(
skip_ipv4_addr: bool = False,
may_have_port: bool = True,
simple_host: bool = False,
+ strict_query: bool = True,
rfc_1034: bool = False,
rfc_2782: bool = False,
):
@@ -167,6 +170,8 @@ def url(
URL string may contain port number.
simple_host:
URL string maybe only hyphens and alpha-numerals.
+ strict_query:
+ Fail validation on query string parsing error.
rfc_1034:
Allow trailing dot in domain/host name.
Ref: [RFC 1034](https://www.rfc-editor.org/rfc/rfc1034).
@@ -214,5 +219,5 @@ def url(
rfc_1034,
rfc_2782,
)
- and _validate_optionals(path, query, fragment)
+ and _validate_optionals(path, query, fragment, strict_query)
) After applying, you'll have to pass PR is welcome. |
Hi!
I have encountered a few weird urls that are not passing validation even though they are functional.
The issue is with the validation of the query and fragment parts of the URL.
Furthermore, the query and fragment validations in
url.py
do not conform to their rather loose definitions in RFC3986.Is that intended or not? If not, I'll open a PR to fix it.
Examples:
https://vydrica.com/ponuka-bytov/#3d-navigator/
https://www.karpatium.sk/ponuka-byvania#/budova/C2/podlazie/3/byt/96
https://moldaupark.cz/img/reality/262/ddca52d8e0680aa3efe02416daf30eab/vamax800/25.jpg?-62169987208
https://t.rmcl.cz/fhi.cz/estate/i0/81/1024x1024-ke5f0-48bdb-prodej-rodinneho-domu-145m2-pardubice-hostovice-b1-hostovice-int0007-resize-978a19b97a-2226429600.jpg?-JJ3I5Aj2jBX0fDNYvETA9DwXTq_wsCIEBHV-DdFyxT6vVsPh6Ges8F0UgAUKmabnAwcYfA2rSt0SdrwaNFR1lb8vA8Fx_AFNi-B3YlnwsWD_CkjZ19OfbGYlMTWEE_lyTPNHAwESPFFBD1ccfylVagVLIqU1t1N7TmiIV0TGg0Xyd9dKipP7p_4kmWDbY2jJMjDEcOiGs1HiVPe8lytnqCetIQ
The text was updated successfully, but these errors were encountered: