-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Improve math character extraction #2009
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I guess this is about supporting the "math" part of https://fontinfo.opensuse.org/fonts/msbm10Medium.html . |
@MartinThoma |
I don't have Acrobat Reader, but I added a column for Evince. It's the same for the Google chrome PDF viewer. |
I've reviewed your files and they do not include the ToUnicode cmap that is expected in pypdf to translate. |
Interesting. Maybe I can talk with the However, there are tons of documents out there which are generated by |
https://tug.org/pipermail/pdftex/2001-February/000385.html : it seems they became aware of the issue in 2001:
But there were issues:
|
In https://github.com/mozilla/pdf.js/pull/16735/files it was added to a glyph list which is used in the cmap lookup: https://github.com/mozilla/pdf.js/blob/86165a7ba6d843f3520aa697933d14e6607e1394/src/core/font_renderer.js#L562C70-L562C75 :
Can we just add it to the following?
|
thanks, |
this is my output:
There is an inversion in the phi which seems to be a bug in latex (mathjax/MathJax#353 (comment)) from acrobat: we are doing better 😁🎇🎇 |
closes py-pdf#2009 note: code clean up removed duplicates from adobe_glyphs
note: code clean up removed duplicates from adobe_glyphs Closes #2009
Explanation
Extracting math content is super hard and at the moment completely out of reach (e.g. fractions, subscripts / superscripts, curly braces for two cases, roots). However, maybe we can improve the extraction a little bit by supporting some heavily used single characters:
Expectations and current state
Generated via:
with those files:
Proof that it's relevant
The text was updated successfully, but these errors were encountered: