Show HN: Autofit2 – End-to-end pipeline for multilingual text classification
Hacker News (score: 12)Description
It's an integrated pipeline for lightweight multilingual text classification, covering preprocessing, training, and evaluation. It implements SetFit, a few-shot learning technique that works well for low-data regimes (down to a few dozen examples), and offers high throughput on CPUs, since it's based on Sentence Transformers. Dependencies are kept lean, but of course PyTorch itself isn't exactly small.
autofit2 takes a base model and a JSON config as input, and outputs a TorchServe model archive as well as a model card. The model card includes any benchmarks you have for your task, self-consistency tests, estimated CO2 emissions of the finetune, as well as an entropy-based bias analysis. For the bias eval, small test corpora for 50 languages are included. It works best with my EAR (Entropy-based Attention Regularization) fork of Sentence Transformers.
Feedback is welcome.
More from Hacker
No other tools from this source yet.