utils

text.utils contains various text specific utility classes/functions

What we're running with at the time this documentation was generated:
torch: 1.9.0+cu102
fastai: 2.7.9
transformers: 4.21.2

source

get_hf_objects

 get_hf_objects (pretrained_model_name_or_path:str|os.PathLike,
                 model_cls:transformers.modeling_utils.PreTrainedModel, co
                 nfig:transformers.configuration_utils.PretrainedConfig|st
                 r|os.PathLike=None, tokenizer_cls:transformers.tokenizati
                 on_utils_base.PreTrainedTokenizerBase=None,
                 config_kwargs:dict={}, tokenizer_kwargs:dict={},
                 model_kwargs:dict={}, cache_dir:str|os.PathLike=None)

Given at minimum a pretrained_model_name_or_path and model_cls (such asAutoModelForSequenceClassification”), this method returns all the Hugging Face objects you need to train a model using Blurr

Singleton object at 0x7f1898457e50>

 Singleton object at 0x7f1898457e50> (*args, **kwargs)

BlurrText is a Singleton (there exists only one instance, and the same instance is returned upon subsequent instantiation requests). You can get at via the NLP constant below.

NLP = BlurrText()
NLP2 = BlurrText()
test_eq(NLP, NLP2)

… the task

# show_doc(BlurrText(BlurrText).get_tasks)

print(NLP.get_tasks())
print("")
print(NLP.get_tasks("bart"))

['AudioFrameClassification', 'CTC', 'CausalImageModeling', 'CausalLM', 'Classification', 'ConditionalGeneration', 'DepthEstimation', 'EntityClassification', 'EntityPairClassification', 'EntitySpanClassification', 'Generation', 'ImageAndTextRetrieval', 'ImageClassification', 'ImageClassificationConvProcessing', 'ImageClassificationFourier', 'ImageClassificationLearned', 'ImagesAndTextClassification', 'InstanceSegmentation', 'LMHead', 'LMHeadModel', 'MaskedImageModeling', 'MaskedLM', 'MultimodalAutoencoding', 'MultipleChoice', 'NextSentencePrediction', 'ObjectDetection', 'OpenQA', 'OpticalFlow', 'PreTraining', 'QuestionAnswering', 'QuestionAnsweringSimple', 'RegionToPhraseAlignment', 'SemanticSegmentation', 'SequenceClassification', 'Teacher', 'TokenClassification', 'VisualReasoning', 'XVector', 'merLayer', 'merModel', 'merPreTrainedModel']

['CausalLM', 'ConditionalGeneration', 'QuestionAnswering', 'SequenceClassification']

… the architecture

# show_doc(BlurrText(BlurrText).get_architectures)

print(NLP.get_architectures())

['albert', 'bart', 'barthez', 'bartpho', 'beit', 'bert', 'bert_generation', 'bert_japanese', 'bertweet', 'big_bird', 'bigbird_pegasus', 'blenderbot', 'blenderbot_small', 'bloom', 'byt5', 'camembert', 'canine', 'clip', 'codegen', 'convbert', 'convnext', 'cpm', 'ctrl', 'cvt', 'data2vec_audio', 'data2vec_text', 'data2vec_vision', 'deberta', 'deberta_v2', 'decision_transformer', 'deit', 'detr', 'distilbert', 'dpr', 'dpt', 'electra', 'encoder_decoder', 'flaubert', 'flava', 'fnet', 'fsmt', 'funnel', 'glpn', 'gpt2', 'gpt_neo', 'gpt_neox', 'gptj', 'groupvit', 'herbert', 'hubert', 'ibert', 'imagegpt', 'layoutlm', 'layoutlmv2', 'layoutlmv3', 'layoutxlm', 'led', 'levit', 'longformer', 'longt5', 'luke', 'lxmert', 'm2m_100', 'marian', 'maskformer', 'mbart', 'mbart50', 'mctct', 'megatron_bert', 'mluke', 'mmbt', 'mobilebert', 'mobilevit', 'mpnet', 'mt5', 'mvp', 'nezha', 'nllb', 'nystromformer', 'openai', 'opt', 'owlvit', 'pegasus', 'perceiver', 'phobert', 'plbart', 'poolformer', 'prophetnet', 'qdqbert', 'rag', 'realm', 'reformer', 'regnet', 'rembert', 'resnet', 'retribert', 'roberta', 'roformer', 'segformer', 'sew', 'sew_d', 'speech_encoder_decoder', 'speech_to_text', 'speech_to_text_2', 'splinter', 'squeezebert', 'swin', 't5', 'tapas', 'tapex', 'trajectory_transformer', 'transfo_xl', 'trocr', 'unispeech', 'unispeech_sat', 'van', 'vilt', 'vision_encoder_decoder', 'vision_text_dual_encoder', 'visual_bert', 'vit', 'vit_mae', 'wav2vec2', 'wav2vec2_conformer', 'wav2vec2_phoneme', 'wavlm', 'xglm', 'xlm', 'xlm_prophetnet', 'xlm_roberta', 'xlm_roberta_xl', 'xlnet', 'yolos', 'yoso']

# show_doc(BlurrText(BlurrText).get_model_architecture)

print(NLP.get_model_architecture("RobertaForSequenceClassification"))

roberta

… and lastly the models (optionally for a given task and/or architecture)

# show_doc(BlurrText(BlurrText).get_models)

print(L(NLP.get_models())[:5])

['AdaptiveEmbedding', 'AlbertForMaskedLM', 'AlbertForMultipleChoice', 'AlbertForPreTraining', 'AlbertForQuestionAnswering']

print(NLP.get_models(arch="bert")[:5])

['BertForMaskedLM', 'BertForMultipleChoice', 'BertForNextSentencePrediction', 'BertForPreTraining', 'BertForQuestionAnswering']

print(NLP.get_models(task="TokenClassification")[:5])

['AlbertForTokenClassification', 'BertForTokenClassification', 'BigBirdForTokenClassification', 'BloomForTokenClassification', 'CamembertForTokenClassification']

print(NLP.get_models(arch="bert", task="TokenClassification"))

['BertForTokenClassification']

To get all your Hugging Face objects (arch, config, tokenizer, and model)

How to use:

from transformers import AutoModelForMaskedLM

hf_logging.set_verbosity_error()

arch, config, tokenizer, model = get_hf_objects("bert-base-cased-finetuned-mrpc", model_cls=AutoModelForMaskedLM)

print(arch)
print(type(config))
print(type(tokenizer))
print(type(model))

bert
<class 'transformers.models.bert.configuration_bert.BertConfig'>
<class 'transformers.models.bert.tokenization_bert_fast.BertTokenizerFast'>
<class 'transformers.models.bert.modeling_bert.BertForMaskedLM'>

Downloading config.json:   0%|          | 0.00/433 [00:00<?, ?B/s]Downloading config.json: 100%|##########| 433/433 [00:00<00:00, 362kB/s]
Downloading tokenizer_config.json:   0%|          | 0.00/29.0 [00:00<?, ?B/s]Downloading tokenizer_config.json: 100%|##########| 29.0/29.0 [00:00<00:00, 30.9kB/s]
Downloading vocab.txt:   0%|          | 0.00/208k [00:00<?, ?B/s]Downloading vocab.txt:  36%|###5      | 75.0k/208k [00:00<00:00, 644kB/s]Downloading vocab.txt: 100%|##########| 208k/208k [00:00<00:00, 1.32MB/s]
Downloading tokenizer.json:   0%|          | 0.00/426k [00:00<?, ?B/s]Downloading tokenizer.json:  20%|#9        | 84.0k/426k [00:00<00:00, 734kB/s]Downloading tokenizer.json: 100%|##########| 426k/426k [00:00<00:00, 2.19MB/s]
Downloading pytorch_model.bin:   0%|          | 0.00/413M [00:00<?, ?B/s]Downloading pytorch_model.bin:   0%|          | 1.57M/413M [00:00<00:26, 16.2MB/s]Downloading pytorch_model.bin:   1%|1         | 5.05M/413M [00:00<00:15, 28.0MB/s]Downloading pytorch_model.bin:   3%|2         | 10.4M/413M [00:00<00:10, 41.0MB/s]Downloading pytorch_model.bin:   5%|4         | 19.1M/413M [00:00<00:06, 60.5MB/s]Downloading pytorch_model.bin:   7%|6         | 28.9M/413M [00:00<00:05, 75.7MB/s]Downloading pytorch_model.bin:   9%|8         | 36.1M/413M [00:00<00:05, 70.7MB/s]Downloading pytorch_model.bin:  10%|#         | 42.9M/413M [00:00<00:05, 68.8MB/s]Downloading pytorch_model.bin:  12%|#1        | 49.5M/413M [00:00<00:06, 57.6MB/s]Downloading pytorch_model.bin:  14%|#3        | 56.0M/413M [00:01<00:06, 58.4MB/s]Downloading pytorch_model.bin:  15%|#5        | 64.0M/413M [00:01<00:06, 60.7MB/s]Downloading pytorch_model.bin:  17%|#7        | 72.0M/413M [00:01<00:06, 55.4MB/s]Downloading pytorch_model.bin:  19%|#8        | 78.3M/413M [00:01<00:06, 55.9MB/s]Downloading pytorch_model.bin:  20%|##        | 83.8M/413M [00:01<00:06, 49.6MB/s]Downloading pytorch_model.bin:  21%|##1       | 88.7M/413M [00:01<00:07, 42.6MB/s]Downloading pytorch_model.bin:  23%|##3       | 96.0M/413M [00:01<00:07, 43.8MB/s]Downloading pytorch_model.bin:  25%|##4       | 103M/413M [00:02<00:08, 38.5MB/s] Downloading pytorch_model.bin:  26%|##5       | 107M/413M [00:02<00:08, 37.0MB/s]Downloading pytorch_model.bin:  27%|##7       | 112M/413M [00:02<00:08, 38.7MB/s]Downloading pytorch_model.bin:  29%|##9       | 120M/413M [00:02<00:06, 44.6MB/s]Downloading pytorch_model.bin:  31%|###       | 128M/413M [00:02<00:06, 49.4MB/s]Downloading pytorch_model.bin:  33%|###2      | 136M/413M [00:02<00:05, 54.9MB/s]Downloading pytorch_model.bin:  35%|###4      | 144M/413M [00:02<00:04, 58.9MB/s]Downloading pytorch_model.bin:  37%|###6      | 152M/413M [00:03<00:04, 59.1MB/s]Downloading pytorch_model.bin:  39%|###8      | 160M/413M [00:03<00:04, 59.0MB/s]Downloading pytorch_model.bin:  41%|####      | 168M/413M [00:03<00:04, 56.8MB/s]Downloading pytorch_model.bin:  42%|####2     | 175M/413M [00:03<00:04, 61.3MB/s]Downloading pytorch_model.bin:  44%|####3     | 181M/413M [00:03<00:04, 54.3MB/s]Downloading pytorch_model.bin:  45%|####5     | 187M/413M [00:03<00:05, 40.8MB/s]Downloading pytorch_model.bin:  46%|####6     | 192M/413M [00:04<00:05, 40.3MB/s]Downloading pytorch_model.bin:  48%|####8     | 200M/413M [00:04<00:05, 44.0MB/s]Downloading pytorch_model.bin:  50%|#####     | 208M/413M [00:04<00:04, 52.4MB/s]Downloading pytorch_model.bin:  52%|#####1    | 214M/413M [00:04<00:04, 51.0MB/s]Downloading pytorch_model.bin:  53%|#####3    | 220M/413M [00:04<00:05, 39.9MB/s]Downloading pytorch_model.bin:  54%|#####4    | 224M/413M [00:04<00:05, 36.7MB/s]Downloading pytorch_model.bin:  55%|#####5    | 228M/413M [00:04<00:05, 36.4MB/s]Downloading pytorch_model.bin:  56%|#####6    | 232M/413M [00:05<00:05, 35.9MB/s]Downloading pytorch_model.bin:  58%|#####7    | 238M/413M [00:05<00:04, 40.8MB/s]Downloading pytorch_model.bin:  59%|#####8    | 242M/413M [00:05<00:04, 37.6MB/s]Downloading pytorch_model.bin:  60%|######    | 248M/413M [00:05<00:04, 35.1MB/s]Downloading pytorch_model.bin:  62%|######1   | 256M/413M [00:05<00:04, 38.6MB/s]Downloading pytorch_model.bin:  64%|######3   | 264M/413M [00:05<00:03, 44.7MB/s]Downloading pytorch_model.bin:  66%|######5   | 272M/413M [00:05<00:02, 51.7MB/s]Downloading pytorch_model.bin:  68%|######7   | 280M/413M [00:06<00:02, 56.2MB/s]Downloading pytorch_model.bin:  70%|######9   | 288M/413M [00:06<00:02, 50.4MB/s]Downloading pytorch_model.bin:  72%|#######1  | 296M/413M [00:06<00:02, 51.5MB/s]Downloading pytorch_model.bin:  73%|#######3  | 302M/413M [00:06<00:02, 52.0MB/s]Downloading pytorch_model.bin:  74%|#######4  | 307M/413M [00:06<00:02, 49.0MB/s]Downloading pytorch_model.bin:  76%|#######5  | 314M/413M [00:06<00:01, 53.8MB/s]Downloading pytorch_model.bin:  77%|#######7  | 319M/413M [00:06<00:01, 49.4MB/s]Downloading pytorch_model.bin:  78%|#######8  | 324M/413M [00:07<00:02, 39.0MB/s]Downloading pytorch_model.bin:  79%|#######9  | 328M/413M [00:07<00:02, 37.0MB/s]Downloading pytorch_model.bin:  81%|########1 | 336M/413M [00:07<00:02, 39.8MB/s]Downloading pytorch_model.bin:  83%|########2 | 342M/413M [00:07<00:01, 45.4MB/s]Downloading pytorch_model.bin:  84%|########3 | 347M/413M [00:07<00:01, 43.4MB/s]Downloading pytorch_model.bin:  85%|########5 | 352M/413M [00:07<00:01, 46.1MB/s]Downloading pytorch_model.bin:  87%|########6 | 359M/413M [00:07<00:01, 53.0MB/s]Downloading pytorch_model.bin:  88%|########8 | 364M/413M [00:07<00:01, 49.3MB/s]Downloading pytorch_model.bin:  90%|########9 | 370M/413M [00:08<00:00, 51.4MB/s]Downloading pytorch_model.bin:  91%|######### | 376M/413M [00:08<00:00, 48.3MB/s]Downloading pytorch_model.bin:  93%|#########2| 384M/413M [00:08<00:00, 43.6MB/s]Downloading pytorch_model.bin:  94%|#########3| 388M/413M [00:08<00:00, 43.5MB/s]Downloading pytorch_model.bin:  95%|#########5| 393M/413M [00:08<00:00, 40.4MB/s]Downloading pytorch_model.bin:  96%|#########6| 399M/413M [00:08<00:00, 40.7MB/s]Downloading pytorch_model.bin:  97%|#########7| 403M/413M [00:08<00:00, 38.1MB/s]Downloading pytorch_model.bin:  99%|#########9| 411M/413M [00:09<00:00, 49.6MB/s]Downloading pytorch_model.bin: 100%|##########| 413M/413M [00:09<00:00, 47.5MB/s]

from transformers import AutoModelForQuestionAnswering

hf_logging.set_verbosity_error()

arch, config, tokenizer, model = get_hf_objects("fmikaelian/flaubert-base-uncased-squad", model_cls=AutoModelForQuestionAnswering)

print(arch)
print(type(config))
print(type(tokenizer))
print(type(model))

flaubert
<class 'transformers.models.flaubert.configuration_flaubert.FlaubertConfig'>
<class 'transformers.models.flaubert.tokenization_flaubert.FlaubertTokenizer'>
<class 'transformers.models.flaubert.modeling_flaubert.FlaubertForQuestionAnsweringSimple'>

Downloading config.json:   0%|          | 0.00/1.49k [00:00<?, ?B/s]Downloading config.json: 100%|##########| 1.49k/1.49k [00:00<00:00, 1.58MB/s]
Downloading tokenizer_config.json:   0%|          | 0.00/366 [00:00<?, ?B/s]Downloading tokenizer_config.json: 100%|##########| 366/366 [00:00<00:00, 321kB/s]
Downloading vocab.json:   0%|          | 0.00/1.26M [00:00<?, ?B/s]Downloading vocab.json:   7%|6         | 84.0k/1.26M [00:00<00:01, 732kB/s]Downloading vocab.json:  46%|####5     | 588k/1.26M [00:00<00:00, 2.87MB/s]Downloading vocab.json: 100%|##########| 1.26M/1.26M [00:00<00:00, 4.74MB/s]
Downloading merges.txt:   0%|          | 0.00/584k [00:00<?, ?B/s]Downloading merges.txt:  14%|#4        | 84.0k/584k [00:00<00:00, 738kB/s]Downloading merges.txt:  94%|#########3| 548k/584k [00:00<00:00, 2.68MB/s]Downloading merges.txt: 100%|##########| 584k/584k [00:00<00:00, 2.53MB/s]
Downloading special_tokens_map.json:   0%|          | 0.00/305 [00:00<?, ?B/s]Downloading special_tokens_map.json: 100%|##########| 305/305 [00:00<00:00, 259kB/s]
Downloading pytorch_model.bin:   0%|          | 0.00/533M [00:00<?, ?B/s]Downloading pytorch_model.bin:   1%|          | 2.83M/533M [00:00<00:18, 29.7MB/s]Downloading pytorch_model.bin:   2%|2         | 11.4M/533M [00:00<00:08, 64.9MB/s]Downloading pytorch_model.bin:   4%|3         | 20.2M/533M [00:00<00:06, 77.3MB/s]Downloading pytorch_model.bin:   5%|5         | 29.0M/533M [00:00<00:06, 83.6MB/s]Downloading pytorch_model.bin:   7%|7         | 38.0M/533M [00:00<00:05, 87.2MB/s]Downloading pytorch_model.bin:   9%|8         | 46.8M/533M [00:00<00:05, 89.0MB/s]Downloading pytorch_model.bin:  10%|#         | 55.7M/533M [00:00<00:05, 90.4MB/s]Downloading pytorch_model.bin:  12%|#2        | 64.5M/533M [00:00<00:05, 91.0MB/s]Downloading pytorch_model.bin:  14%|#3        | 73.4M/533M [00:00<00:05, 91.8MB/s]Downloading pytorch_model.bin:  15%|#5        | 82.2M/533M [00:01<00:05, 91.9MB/s]Downloading pytorch_model.bin:  17%|#7        | 91.0M/533M [00:01<00:05, 92.1MB/s]Downloading pytorch_model.bin:  19%|#8        | 99.8M/533M [00:01<00:04, 92.2MB/s]Downloading pytorch_model.bin:  20%|##        | 109M/533M [00:01<00:04, 92.1MB/s] Downloading pytorch_model.bin:  22%|##2       | 117M/533M [00:01<00:04, 92.1MB/s]Downloading pytorch_model.bin:  24%|##3       | 126M/533M [00:01<00:04, 92.2MB/s]Downloading pytorch_model.bin:  25%|##5       | 135M/533M [00:01<00:04, 92.5MB/s]Downloading pytorch_model.bin:  27%|##7       | 144M/533M [00:01<00:04, 92.8MB/s]Downloading pytorch_model.bin:  29%|##8       | 153M/533M [00:01<00:04, 92.9MB/s]Downloading pytorch_model.bin:  30%|###       | 162M/533M [00:01<00:04, 92.9MB/s]Downloading pytorch_model.bin:  32%|###2      | 171M/533M [00:02<00:04, 93.0MB/s]Downloading pytorch_model.bin:  34%|###3      | 180M/533M [00:02<00:03, 93.3MB/s]Downloading pytorch_model.bin:  35%|###5      | 189M/533M [00:02<00:03, 93.3MB/s]Downloading pytorch_model.bin:  37%|###7      | 197M/533M [00:02<00:03, 93.2MB/s]Downloading pytorch_model.bin:  39%|###8      | 206M/533M [00:02<00:03, 92.9MB/s]Downloading pytorch_model.bin:  40%|####      | 215M/533M [00:02<00:03, 92.5MB/s]Downloading pytorch_model.bin:  42%|####2     | 224M/533M [00:02<00:03, 92.6MB/s]Downloading pytorch_model.bin:  44%|####3     | 233M/533M [00:02<00:03, 92.4MB/s]Downloading pytorch_model.bin:  45%|####5     | 242M/533M [00:02<00:03, 92.9MB/s]Downloading pytorch_model.bin:  47%|####7     | 251M/533M [00:02<00:03, 93.1MB/s]Downloading pytorch_model.bin:  49%|####8     | 260M/533M [00:03<00:03, 92.9MB/s]Downloading pytorch_model.bin:  50%|#####     | 269M/533M [00:03<00:02, 92.6MB/s]Downloading pytorch_model.bin:  52%|#####2    | 277M/533M [00:03<00:02, 92.8MB/s]Downloading pytorch_model.bin:  54%|#####3    | 286M/533M [00:03<00:02, 93.2MB/s]Downloading pytorch_model.bin:  55%|#####5    | 295M/533M [00:03<00:02, 93.4MB/s]Downloading pytorch_model.bin:  57%|#####7    | 304M/533M [00:03<00:02, 93.4MB/s]Downloading pytorch_model.bin:  59%|#####8    | 313M/533M [00:03<00:02, 93.4MB/s]Downloading pytorch_model.bin:  60%|######    | 322M/533M [00:03<00:02, 93.0MB/s]Downloading pytorch_model.bin:  62%|######2   | 331M/533M [00:03<00:02, 93.1MB/s]Downloading pytorch_model.bin:  64%|######3   | 340M/533M [00:03<00:02, 92.7MB/s]Downloading pytorch_model.bin:  65%|######5   | 349M/533M [00:04<00:02, 93.5MB/s]Downloading pytorch_model.bin:  67%|######7   | 358M/533M [00:04<00:01, 92.3MB/s]Downloading pytorch_model.bin:  69%|######8   | 367M/533M [00:04<00:01, 91.8MB/s]Downloading pytorch_model.bin:  70%|#######   | 376M/533M [00:04<00:01, 92.0MB/s]Downloading pytorch_model.bin:  72%|#######2  | 384M/533M [00:04<00:01, 92.1MB/s]Downloading pytorch_model.bin:  74%|#######3  | 393M/533M [00:04<00:01, 92.0MB/s]Downloading pytorch_model.bin:  75%|#######5  | 402M/533M [00:04<00:01, 92.3MB/s]Downloading pytorch_model.bin:  77%|#######7  | 411M/533M [00:04<00:02, 57.8MB/s]Downloading pytorch_model.bin:  79%|#######8  | 420M/533M [00:05<00:01, 65.0MB/s]Downloading pytorch_model.bin:  80%|########  | 428M/533M [00:05<00:01, 71.0MB/s]Downloading pytorch_model.bin:  82%|########1 | 437M/533M [00:05<00:01, 76.0MB/s]Downloading pytorch_model.bin:  84%|########3 | 446M/533M [00:05<00:01, 80.3MB/s]Downloading pytorch_model.bin:  85%|########5 | 455M/533M [00:05<00:00, 83.9MB/s]Downloading pytorch_model.bin:  87%|########6 | 464M/533M [00:05<00:00, 86.6MB/s]Downloading pytorch_model.bin:  89%|########8 | 472M/533M [00:05<00:00, 88.2MB/s]Downloading pytorch_model.bin:  90%|######### | 481M/533M [00:05<00:00, 89.6MB/s]Downloading pytorch_model.bin:  92%|#########1| 490M/533M [00:05<00:00, 90.4MB/s]Downloading pytorch_model.bin:  94%|#########3| 499M/533M [00:05<00:00, 91.3MB/s]Downloading pytorch_model.bin:  95%|#########5| 508M/533M [00:06<00:00, 91.9MB/s]Downloading pytorch_model.bin:  97%|#########6| 517M/533M [00:06<00:00, 92.1MB/s]Downloading pytorch_model.bin:  99%|#########8| 526M/533M [00:06<00:00, 92.2MB/s]Downloading pytorch_model.bin: 100%|##########| 533M/533M [00:06<00:00, 88.7MB/s]

from transformers import BertTokenizer, BertForNextSentencePrediction

hf_logging.set_verbosity_error()

arch, config, tokenizer, model = get_hf_objects(
    "bert-base-cased-finetuned-mrpc", config=None, tokenizer_cls=BertTokenizer, model_cls=BertForNextSentencePrediction
)
print(arch)
print(type(config))
print(type(tokenizer))
print(type(model))

bert
<class 'transformers.models.bert.configuration_bert.BertConfig'>
<class 'transformers.models.bert.tokenization_bert.BertTokenizer'>
<class 'transformers.models.bert.modeling_bert.BertForNextSentencePrediction'>