Skip to content

Valid URLs failing validation - query and fragment parts #296

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
danherbriley opened this issue Sep 6, 2023 · 2 comments · Fixed by #297
Closed

Valid URLs failing validation - query and fragment parts #296

danherbriley opened this issue Sep 6, 2023 · 2 comments · Fixed by #297
Labels
bug Issue: Works not as designed

Comments

@danherbriley
Copy link
Contributor

Hi!

I have encountered a few weird urls that are not passing validation even though they are functional.
The issue is with the validation of the query and fragment parts of the URL.
Furthermore, the query and fragment validations in url.py do not conform to their rather loose definitions in RFC3986.
Is that intended or not? If not, I'll open a PR to fix it.

Examples:

@cfi-gb
Copy link

cfi-gb commented Sep 6, 2023

Hello,

i just was in the mid of creating a bug report as well (thanks for raising this 👍) for previously valid URLs which are now reported as invalid after the update from 0.20.0 to 0.21.2 (still seen also after the update to 0.22.0 to include the fix from #289 for #288).

These are the cases we're seeing as reported to be invalid while they are actually (to the best of my knowledge, haven't checked all of them) valid:

https://launchpad.support.sap.com/#/notes/2718993
https://forums.livezilla.net/index.php?/topic/10983-fg-vd-19-086-livezilla-server-is-vulnerable-to-sql-injection-ii/
http://www.brocade.com/en/backend-content/pdf-page.html?/content/dam/common/documents/content-types/security-bulletin/brocade-security-advisory-2016-168.pdf
https://groups.google.com/forum/?fromgroups#!topic/rubyonrails-security/8SA-M3as7A8
https://forum.bitdefender.com/index.php?/topic/75470-doubleagent/
https://www.smartftp.com/forums/index.php?/topic/16425-smartftp-client-40-change-log/
https://exchange.xforce.ibmcloud.com/#/vulnerabilities/100912
https://support.microsoft.com/en-us/lifecycle#gp/Microsoft-Internet-Explorer
https://www.watchguard.com/support/release-notes/fireware/12/en-US/EN_ReleaseNotes_Fireware_12_5_9/index.html#Fireware/en-US/resolved_issues.html
https://code.wireshark.org/review/#/c/25660/
https://groups.google.com/forum/#!topic/kubernetes-announce/yBrFf5nmvfI
https://issues.sonatype.org/plugins/servlet/mobile#issue/NEXUS-16870
http://forums.livezilla.net/index.php?/topic/163-livezilla-changelog/
https://www.watchguard.com/support/release-notes/fireware/11/en-US/EN_ReleaseNotes_Fireware_11_12_1/index.html#Fireware/en-US/resolved_issues.html%3FTocPath%3D_____13
https://review.typo3.org/#/c/37013
https://forums.malwarebytes.org/index.php?/topic/158251-malwarebytes-anti-exploit-hall-of-fame/
http://speedtouch.sourceforge.io/index.php?/news.en.html
https://support.k7computing.com/index.php?/Knowledgebase/Article/View/173/41/advisory-issued-on-6th-november-2017
http://support.novell.com/cgi-bin/search/searchtid.cgi?/10077872.htm

@yozachar
Copy link
Collaborator

yozachar commented Sep 11, 2023

Here's the patch:

�diff --git a/src/validators/url.py b/src/validators/url.py
index 16698b1..16259e7 100644
--- a/src/validators/url.py
+++ b/src/validators/url.py
@@ -3,7 +3,7 @@
 # standard
 from functools import lru_cache
 import re
-from urllib.parse import unquote, urlsplit
+from urllib.parse import parse_qs, unquote, urlsplit
 
 # local
 from .hostname import hostname
@@ -34,11 +34,6 @@ def _path_regex():
     )
 
 
-@lru_cache
-def _query_regex():
-    return re.compile(r"&?(\w+=?[^\s&]*)", re.IGNORECASE)
-
-
 def _validate_scheme(value: str):
     """Validate scheme."""
     # More schemes will be considered later.
@@ -108,16 +103,23 @@ def _validate_netloc(
     ) and _validate_auth_segment(basic_auth)
 
 
-def _validate_optionals(path: str, query: str, fragment: str):
+def _validate_optionals(
+    path: str,
+    query: str,
+    fragment: str,
+    strict_query: bool = False
+):
     """Validate path query and fragments."""
     optional_segments = True
     if path:
         optional_segments &= bool(_path_regex().match(path))
-    if query:
-        optional_segments &= bool(_query_regex().match(query))
+    if query and parse_qs(query, strict_parsing=strict_query):
+        optional_segments &= True
     if fragment:
         fragment = fragment.lstrip("/") if fragment.startswith("/") else fragment
-        optional_segments &= all(char_to_avoid not in fragment for char_to_avoid in ("/", "?"))
+        optional_segments &= all(
+            char_to_avoid not in fragment for char_to_avoid in ("?",)
+        )
     return optional_segments
 
 
@@ -130,6 +132,7 @@ def url(
     skip_ipv4_addr: bool = False,
     may_have_port: bool = True,
     simple_host: bool = False,
+    strict_query: bool = True,
     rfc_1034: bool = False,
     rfc_2782: bool = False,
 ):
@@ -167,6 +170,8 @@ def url(
             URL string may contain port number.
         simple_host:
             URL string maybe only hyphens and alpha-numerals.
+        strict_query:
+            Fail validation on query string parsing error.
         rfc_1034:
             Allow trailing dot in domain/host name.
             Ref: [RFC 1034](https://www.rfc-editor.org/rfc/rfc1034).
@@ -214,5 +219,5 @@ def url(
             rfc_1034,
             rfc_2782,
         )
-        and _validate_optionals(path, query, fragment)
+        and _validate_optionals(path, query, fragment, strict_query)
     )

After applying, you'll have to pass strict_query = False, to get those URLs validated.

PR is welcome.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Issue: Works not as designed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants