Skip to content

CodexEsto/textpipe

Repository files navigation

Contributors Forks Stargazers Issues MIT License LinkedIn


Logo

textpipe

Modern text processing pipeline for machine learning applications

Report Bug · Request Feature

Table of Contents
  1. About The Project
  2. Getting Started
  3. Contributing
  4. License
  5. Contact

About The Project

textpipe is an end-to-end text processing pipeline designed for modern NLP workflows. It provides:

  • Configurable Processing: YAML-based configuration for all processing steps
  • Modular Architecture: Clean separation of data loading, cleaning, vectorization, and modeling
  • Production Ready: Built-in logging, error handling, and type validation
  • ML Integration: Seamless integration with scikit-learn models
  • Customizable Components:
    • Multiple text cleaning strategies
    • Configurable tokenization (stemming, stopwords)
    • TF-IDF vectorization with automatic feature management
    • Extensible model architecture

(back to top)

Getting Started

Installation

Install the package with pip:

pip install textpipe

Update existing installation:

pip install textpipe --upgrade

Usage

Basic text processing pipeline example:

from textpipe import Config, load_csv, SentimentPipeline

# Initialize configuration
config = Config.get()

# Load training data
texts, labels = load_csv("data/train.csv")

# Initialize and train pipeline
pipeline = SentimentPipeline(config)
pipeline.train(texts, labels)

# Make predictions
new_texts = ["I love this product!", "Terrible service..."]
predictions = pipeline.predict(new_texts)
print(predictions)

Advanced configuration example (config.yml):

processing:
  language: english
  remove_stopwords: true
  use_stemming: false
  max_features: 5000
  min_text_length: 3
logging:
  level: INFO

(back to top)

Contributing

Contributions are what make the open source community an amazing place to learn, inspire, and create. Any contributions are greatly appreciated.

  1. Fork the Project
  2. Create your Feature Branch (git checkout -b feature/AmazingFeature)
  3. Commit your Changes (git commit -m 'Add some AmazingFeature')
  4. Push to the Branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

Top Contributors:

Project Contributors

(back to top)

License

Distributed under the MIT License. See LICENSE for more information.

(back to top)

Contact

Textpipe Team - [email protected]

Project Link: https://github.com/CodexEsto/textpipe

(back to top)

Acknowledgments

  • Scikit-learn community for foundational ML components
  • NLTK team for language processing resources
  • Pandas for data handling capabilities
  • All contributors and open-source maintainers who inspired this work

(back to top)

About

This repository contains the suggestion system for Codex projects.

Resources

License

Stars

Watchers

Forks

Packages

No packages published