我们引入了SetFit,这是一个高效且无提示的框架,用于句子变形金刚的少量微调。SetFit以很少的标记数据实现高精度 - 例如,在客户评论情绪数据集上每个类只有8个标记示例,SetFit在完整的3k示例
与其他很少学习的方法相比,SetFit有几个独特的功能:
通过运行以下命令下载并安装:
setfit
python -m pip install setfit
setfit与拥抱面部中心集成,并提供两个主要类:
SetFitModel:将来自的预训练主体和来自的分类头组合在一起的包装纸
sentence_transformers
scikit-learn
SetFitTrainer:一个帮助器类,它包装了 SetFit 的微调过程。
下面是一个端到端示例:
from datasets import load_dataset
from sentence_transformers.losses import CosineSimilarityLoss
from setfit import SetFitModel, SetFitTrainer
# Load a dataset from the Hugging Face Hub
dataset = load_dataset("emotion")
# Simulate the few-shot regime by sampling 8 examples per class
num_classes = 6
train_ds = dataset["train"].shuffle(seed=42).select(range(8 * num_classes))
test_ds = dataset["test"]
# Load SetFit model from Hub
model = SetFitModel.from_pretrained("sentence-transformers/paraphrase-mpnet-base-v2")
# Create trainer
trainer = SetFitTrainer(
model=model,
train_dataset=train_ds,
eval_dataset=test_ds,
loss_class=CosineSimilarityLoss,
batch_size=16,
num_iterations=20, # The number of text pairs to generate
)
# Train and evaluate
trainer.train()
metrics = trainer.evaluate()
# Push model to the Hub
trainer.push_to_hub("my-awesome-setfit-model")
有关更多示例,请查看该文件夹。
notebooks/
我们提供脚本来重现 SetFit 的结果以及我们论文表 2 中提供的各种基线。查看目录中的设置和培训说明。
scripts/
要运行此项目中的代码,请首先使用例如 Conda 创建一个 Python 虚拟环境:
conda create -n setfit python=3.9 && conda activate setfit
然后使用以下命令安装基本要求:
python -m pip install -e '.[dev]'
这将安装和打包,我们用它来确保代码格式一致。接下来,转到其中一个专用基线目录并安装额外的依赖项,例如
datasets
black
isort
cd scripts/setfit
python -m pip install -r requirements.txt
我们使用 并确保代码格式一致。按照安装步骤操作后,你可以通过运行以下命令在本地检查代码:
black
isort
make style && make quality
├── LICENSE ├── Makefile <- Makefile with commands like `make style` or `make tests` ├── README.md <- The top-level README for developers using this project. ├── notebooks <- Jupyter notebooks. ├── final_results <- Model predictions from the paper ├── scripts <- Scripts for training and inference ├── setup.cfg <- Configuration file to define package metadata ├── setup.py <- Make this project pip installable with `pip install -e` ├── src <- Source code for SetFit └── tests <- Unit tests
doi = {10.48550/ARXIV.2209.11055},
url = {https://arxiv.org/abs/2209.11055},
author = {Tunstall, Lewis and Reimers, Nils and Jo, Unso Eun Seo and Bates, Luke and Korat, Daniel and Wasserblat, Moshe and Pereg, Oren},
keywords = {Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences},
title = {Efficient Few-Shot Learning Without Prompts},
publisher = {arXiv},
year = {2022},
copyright = {Creative Commons Attribution 4.0 International}}