GitHub - CodexEsto/textpipe: This repository contains the suggestion system for Codex projects.

textpipe

Modern text processing pipeline for machine learning applications

Report Bug · Request Feature

Table of Contents

About The Project
Getting Started
- Installation
- Usage
Contributing
License
Contact

About The Project

textpipe is an end-to-end text processing pipeline designed for modern NLP workflows. It provides:

Configurable Processing: YAML-based configuration for all processing steps
Modular Architecture: Clean separation of data loading, cleaning, vectorization, and modeling
Production Ready: Built-in logging, error handling, and type validation
ML Integration: Seamless integration with scikit-learn models
Customizable Components:
- Multiple text cleaning strategies
- Configurable tokenization (stemming, stopwords)
- TF-IDF vectorization with automatic feature management
- Extensible model architecture

(back to top)

Getting Started

Installation

Install the package with pip:

pip install textpipe

Update existing installation:

pip install textpipe --upgrade

Usage

Basic text processing pipeline example:

from textpipe import Config, load_csv, SentimentPipeline

# Initialize configuration
config = Config.get()

# Load training data
texts, labels = load_csv("data/train.csv")

# Initialize and train pipeline
pipeline = SentimentPipeline(config)
pipeline.train(texts, labels)

# Make predictions
new_texts = ["I love this product!", "Terrible service..."]
predictions = pipeline.predict(new_texts)
print(predictions)

Advanced configuration example (config.yml):

processing:
  language: english
  remove_stopwords: true
  use_stemming: false
  max_features: 5000
  min_text_length: 3
logging:
  level: INFO

(back to top)

Contributing

Contributions are what make the open source community an amazing place to learn, inspire, and create. Any contributions are greatly appreciated.

Fork the Project
Create your Feature Branch (git checkout -b feature/AmazingFeature)
Commit your Changes (git commit -m 'Add some AmazingFeature')
Push to the Branch (git push origin feature/AmazingFeature)
Open a Pull Request

Top Contributors:

(back to top)

License

Distributed under the MIT License. See LICENSE for more information.

(back to top)

Contact

Textpipe Team - [email protected]

Project Link: https://github.com/CodexEsto/textpipe

(back to top)

Acknowledgments

Scikit-learn community for foundational ML components
NLTK team for language processing resources
Pandas for data handling capabilities
All contributors and open-source maintainers who inspired this work

(back to top)

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
.github		.github
assets		assets
codextextpipe		codextextpipe
data		data
docs		docs
site		site
sphinx-docs		sphinx-docs
tests		tests
.gitignore		.gitignore
.readthedocs.yml		.readthedocs.yml
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt
test.py		test.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

textpipe

About The Project

Getting Started

Installation

Usage

Contributing

Top Contributors:

License

Contact

Acknowledgments

About

Uh oh!

Releases 4

Packages

Uh oh!

Languages

License

CodexEsto/textpipe

Folders and files

Latest commit

History

Repository files navigation

textpipe

About The Project

Getting Started

Installation

Usage

Contributing

Top Contributors:

License

Contact

Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 4

Packages 0

Uh oh!

Languages

Packages