Skip to content

lapalme/pyrealb

Repository files navigation

pyrealb - A Python Bilingual Text Realizer

Version 3.2.4 - April 2025

pyrealb is a Python adaptation of the JavaScript jsRealB text realizer with the same constituent and dependency syntax notation. It facilitates its integration within Python applications by simply adding

from pyrealb import *

Online documentation

Version 3.0.0 was a major code reorganization, but without any new feature, to clearly separate language dependent parts from the language independent ones. This organization is described here .

The use of pyrealb for Bilingual Data-to-text generation is described in this document.

Installing the distribution package from PyPI

pip install pyrealb

Caution: do not forget the b at the end of pyrealb. On PyPI, there is an unrelated package pyreal for evaluating and deploying human readable machine learning explanations.

Upgrading the version

pip install pyrealb --upgrade

Building and installing the package from the sources

  1. cd into this directory (with pyproject.toml file)
  2. Build the distribution package python3 -m build
  3. Install with python3 -m pip install .

First realization tests at the Python 3 prompt

  1. from pyrealb import *
  2. loadEn()
  3. print(S(Pro("I").g("f"),VP(V("say"),"hello",PP(P("to"),NP(D("the"),N("world"))))))
  4. this should print She says hello to the world.
  5. print(root(V("say").t("ps"),subj(Pro("him").c("nom")),comp(N("goodbye"))).typ({"neg":True}))
  6. this should print He did not say goodbye.

Use pyrealb in a Jupyter notebook

  • Thru Binder, load one of these links: English French
  • From a terminal:
    • if the Python notebook module is not already installed, do: pip3 install notebook and watch a lot of loading...
    • cd Notebooks
    • python3 -m notebook this should open a browser window with links to the English and French notebooks

Directories

  • src/pyrealb
    • __init__.py : import classes and functions and export relevant symbols.
    • Constituent.py: Constituent is the top class for methods shared between Phrases and Terminals
    • ConstituentEn.py, ConstituentFr.py : English and French specific processing of Constituent
    • Dependent.py : subclass of Constituent for creating complex phrases using dependencies
    • DependentEn.py, DependentFr.py : English and French specific processing of Dependent
    • lemmatize.py : function for building the lemmata maps
    • Lexicon.py: class to access lexicon entries and syntactic rules
    • LICENSE.txt: Apache 2.0 License
    • NonTerminalEn.py, NonTerminalFr.py : language dependent processing common to Phrase and Dependent
    • Number.py : utility functions for dealing with number formatting
    • Phrase.py : subclass of Constituent for creating complex phrases
    • PhraseEn.py, PhraseFr.py : English and French specific processing of Phrase
    • Terminal.py : subclass of Constituent for creating a single unit (most often a single word)
    • TerminalEn.py, TerminalFr.py : English and French specific processing of Terminal
    • utils.py : some useful functions
  • ./src/pyrealb/data: these resources are identical to the corresponding files in jsRealB
    • LICENSE.txt : Creative Common license
    • lexicon-en.json : English lexicon (33,932 entries) in json format
    • rule-en.js : English conjugation and declension tables
    • lexicon-fr.json : French lexicon (52,547 entries) in json format
    • rule-fr.js : French conjugation and declension tables
    • lexicon-en.jsonrnc, lexicon-fr.jsonrnc : json-rnc schemas for the lexicons
    • lexicon-en.jsonrnc.json, lexicon-fr.jsonrnc.json : standard JSON schemas corresponding to the json-rnc schemas for the lexicons; these files are created by the validation process.

Nota bene:

  1. In the following directories, the __init__.py file is used to set the appropriate search path for pyrealb functions; this ensures that the current Python source files are used for execution and testing.
  2. Some directories include markup.py which should be loaded using pip. Unfortunately I never managed to make this "piped" version work, it does not import the name onelineralthough it should. It works only if the file is in the local directory.
  • docs: The html and image files should be copied at http://www.iro.umontreal.ca/~lapalme/pyrealb/ which is used for convenient web access.

    • English and French documentation

      • documentation.html : generated documentation DO NOT EDIT directly Online version
      • documentation.py: Python program for generating documentation.html using markup.py
      • style.css: style sheet for the documentation
      • userinfos.py: definitions of variables containing the examples
      • user.js : Python helper script.
    • Supplements to the documentation for specific aspects. Edit a Markdown (.md) file and use the Makefile for generating the html version.

      • Hacking-pyrealb.md : tricks of the trade for dynamic constituent structure modification
      • Lexicon-Format-en.md, Lexicon-Format-fr.md : language specific detailed documentation about lexicon entries.
      • Realizer-Architecture.md : description of the class organisation of the jsRealB/pyrealb ecosystem
      • Makefile :
        • make all : update the documentation after changing documentation.py or any Markdown file
        • make export : list the files that should be present on the web consultation directory
  • IDE : Integrated Development Environment

    • ide.py: built on the Python read-eval-print loop, it imports pyrealb to get the realization of an expression, to consult the lexicon, the conjugation and declension tables. It is also possible to get a lemmatization: i.e. the pyrealb expression corresponding to a form.
    • README.html: documentation and examples

    Nota bene: The evaluation demo of jsRealB is more convenient than this IDE to develop pyrealb expressions as both programs share the same formalism. The jsRealB demo provides an editor and access to the lexicons and rules.

  • Notebooks : Jupyter notebooks (in English and French) with can be used as an executable introduction to pyrealb

  • tests : unit tests of special features of pyrealb in both French and English. They are designed to launched with pytest. Files have the pattern test_*_{en|fr}.py.

    • README.md: more details on the organisation and use of the test files
    • test_all.sh : run this file in a terminal to run all test files of the directory

Demos

  • 99bottlesofbeer/99bottlesofbeer.py : simple generation of a classic repetitive text in English.
  • basketball/sportsettsum.py : generation of French and English basketball summaries paper describing the approach
  • Bilinguo/bilinguo.py : generation of translation drill exercises
  • dev_example/dev_example.py: examples of English and French expressions to be realized and checked against expected output, useful for debugging when adding a new expression and enabling tracing.
  • eliza/eliza-talk.py: French version of Eliza. It illustrates some interesting features of pyrealb. See this document (in French) for an explanation and rationale. It is a Python translation of this jsRealB demo.
  • evenements/evenements.py : Description (in French) of a list of events, it creates HTML.
  • flight_infos/README.md : development of a RASA NLG server giving information about flights, aircrafts, etc...
  • gen_from_words.py : generation of English and French sentences from a plain list of words, adding some structure.
  • gen_stanza_uds/*.py : various programs used for generating sentences for helping the Stanza lemmatizing learn new inflected forms in French but also in English.
  • gophypi/amr2text.py : generate a literal reading of an AMR (Abstract Meaning Representation); paper describing the approach
  • inflection/inflection.py : French or English conjugation and declension of a form.
  • kilometresapied/kilometresapied.py : simple generation of a classic repetitive text in French.
  • methodius/methodius.py : generation of English sentences from a logical form expressed in XML.
  • randomgen/randomgen.py: Generation of random English sentences
  • RDFpyrealb/WebGenerate.py : Generation from RDF triples
  • report/report.py : Single sentence parameterized by language, tense and subject using two different program organization
  • variantes/variantes.py: French or English sentences realized with all possible sentence modifiers; some challenging examples are in examples.py.
  • weather/Bulletin.py: French and English weather bulletins generated from information in a json-line file. (weather-data.jsonl). It uses the packages in the Realization directory.

Licenses

  • pyrealb source code is licensed under Apache-2.0
  • Linguistic resources in the ./data directory are licensed under CC-BY-SA-4.0

Contact

Guy Lapalme

Acknowledgement

Thanks to Fabrizio Gotti, François Lareau and Ludan Stoeckle for interesting suggestions.

For the maintainer mainly

Updating package version on PyPI

see this tutorial

These steps take for granted that the password for PyPI has already been given...

  1. Update version number in setup.cfg (it should be the same as python_version in src/pyrealb/utils.py and at the beginning of this document).
  2. Run docs/documentation.py to update the version number in docs/documentation.html
  3. Commit pyrealb on GitHub
  4. cd into the directory with the pyproject.toml file (the same as this README.md)
  5. Build the distribution package
    python3 -m build
  6. Upload to PyPi the last version I.J.K twine upload dist/*-I.J.K.*
  7. Install new version from PyPI
    python3 -m pip install pyrealb --upgrade

Useful trick for debugging with breaking point and tracing in PyCharm

  1. add pyrealb expression to debug at the end of demo/dev_example/dev_example.py
  2. comment the line calling testPreviousExamples()
  3. debug demo/dev_example/dev_example.py

About

French and English text realisator

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published