Skip to content

Commit 2bccb8d

Browse files
committed
Significantly enhance the safety of metadata manipulation
1 parent 5f92462 commit 2bccb8d

File tree

10 files changed

+1271
-557
lines changed

10 files changed

+1271
-557
lines changed

CHANGELOG.md

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,23 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
1111

1212
- Renamed `filesystem.validate_zimfile_creatable` to `filesystem.file_creatable` to reflect general applicability to check file creation beyond ZIM files #200
1313
- Remove any "ZIM" reference in exceptions while working with files #200
14+
- Significantly enhance the safety of metadata manipulation (#205)
15+
- add types for all metadata, one type per metadata name plus some generic ones for non-standard metadata
16+
- all types are responsible to validate metadata value at initialization time
17+
- validation checks for adherence to the ZIM specification and conventions are automated
18+
- cleanup of unwanted control characters and stripping white characters are automated
19+
- whenever possible, try to clean a "reasonably" bad metadata (e.g. automaticall accept and remove duplicate tags - harmless - but not duplicate language codes - codes are supposed to be ordered, so it is a weird situation)
20+
- it is now possible to disable ZIM conventions checks with `zim.metadata.check_metadata_conventions`
21+
- simplify `zim.creator.Creator.config_metadata` by using these types and been more strict:
22+
- add new `StandardMetadata` class for standard metadata, including list of mandatory one
23+
- by default, all non-standard metadata must start with `X-` prefix
24+
- this not yet an openZIM convention / specification, so it is possible to disable this check with `fail_on_missing_prefix` argument
25+
- simplify `add_metadata`, use same metadata types
26+
- simplify `zim.creator.Creator.start` with new types, and drop all metadata from memory after being passed to the libzim
27+
- drop `zim.creator.convert_and_check_metadata` (not usefull anymore, simply use proper metadata type)
28+
- move `MANDATORY_ZIM_METADATA_KEYS` and `DEFAULT_DEV_ZIM_METADATA` from `constants` to `zim.metadata` to avoid circular dependencies
29+
- new `inputs.unique_values` utility function to compute the list of uniques values from a given list, but preserving initial list order
30+
- in `__init__` of `zim.creator.Creator`, rename `disable_metadata_checks` to `check_metadata_conventions` for clarity and brevity
1431

1532
### Added
1633

src/zimscraperlib/constants.py

Lines changed: 0 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,6 @@
11
#!/usr/bin/env python3
22
# vim: ai ts=4 sts=4 et sw=4 nu
33

4-
import base64
54
import pathlib
65
import re
76

@@ -21,34 +20,6 @@
2120
# list of mimetypes we consider articles using it should default to FRONT_ARTICLE
2221
FRONT_ARTICLE_MIMETYPES = ["text/html"]
2322

24-
# list of mandatory meta tags of the zim file.
25-
MANDATORY_ZIM_METADATA_KEYS = [
26-
"Name",
27-
"Title",
28-
"Creator",
29-
"Publisher",
30-
"Date",
31-
"Description",
32-
"Language",
33-
"Illustration_48x48@1",
34-
]
35-
36-
DEFAULT_DEV_ZIM_METADATA = {
37-
"Name": "Test Name",
38-
"Title": "Test Title",
39-
"Creator": "Test Creator",
40-
"Publisher": "Test Publisher",
41-
"Date": "2023-01-01",
42-
"Description": "Test Description",
43-
"Language": "fra",
44-
# blank 48x48 transparent PNG
45-
"Illustration_48x48_at_1": base64.b64decode(
46-
"iVBORw0KGgoAAAANSUhEUgAAADAAAAAwAQMAAABtzGvEAAAAGXRFWHRTb2Z0d2FyZQBB"
47-
"ZG9iZSBJbWFnZVJlYWR5ccllPAAAAANQTFRFR3BMgvrS0gAAAAF0Uk5TAEDm2GYAAAAN"
48-
"SURBVBjTY2AYBdQEAAFQAAGn4toWAAAAAElFTkSuQmCC"
49-
),
50-
}
51-
5223
RECOMMENDED_MAX_TITLE_LENGTH = 30
5324
MAXIMUM_DESCRIPTION_METADATA_LENGTH = 80
5425
MAXIMUM_LONG_DESCRIPTION_METADATA_LENGTH = 4000

src/zimscraperlib/inputs.py

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -136,3 +136,8 @@ def compute_tags(
136136
return {
137137
tag.strip() for tag in list(default_tags) + (user_tags or "").split(";") if tag
138138
}
139+
140+
141+
def unique_values(items: list) -> list:
142+
"""Return unique values in input list while preserving list order"""
143+
return list(dict.fromkeys(items))

0 commit comments

Comments
 (0)