What we're running with at the time this documentation was generated:
torch: 1.9.0+cu102
fastai: 2.7.9
transformers: 4.21.2
utils
text.utils
contains various text specific utility classes/functions
get_hf_objects
get_hf_objects (pretrained_model_name_or_path:str|os.PathLike, model_cls:transformers.modeling_utils.PreTrainedModel, co nfig:transformers.configuration_utils.PretrainedConfig|st r|os.PathLike=None, tokenizer_cls:transformers.tokenizati on_utils_base.PreTrainedTokenizerBase=None, config_kwargs:dict={}, tokenizer_kwargs:dict={}, model_kwargs:dict={}, cache_dir:str|os.PathLike=None)
Given at minimum a pretrained_model_name_or_path
and model_cls (such as
AutoModelForSequenceClassification”), this method returns all the Hugging Face objects you need to train a model using Blurr
Singleton object at 0x7f1898457e50>
Singleton object at 0x7f1898457e50> (*args, **kwargs)
BlurrText
is a Singleton
(there exists only one instance, and the same instance is returned upon subsequent instantiation requests). You can get at via the NLP
constant below.
= BlurrText()
NLP = BlurrText()
NLP2 test_eq(NLP, NLP2)
… the task
# show_doc(BlurrText(BlurrText).get_tasks)
print(NLP.get_tasks())
print("")
print(NLP.get_tasks("bart"))
['AudioFrameClassification', 'CTC', 'CausalImageModeling', 'CausalLM', 'Classification', 'ConditionalGeneration', 'DepthEstimation', 'EntityClassification', 'EntityPairClassification', 'EntitySpanClassification', 'Generation', 'ImageAndTextRetrieval', 'ImageClassification', 'ImageClassificationConvProcessing', 'ImageClassificationFourier', 'ImageClassificationLearned', 'ImagesAndTextClassification', 'InstanceSegmentation', 'LMHead', 'LMHeadModel', 'MaskedImageModeling', 'MaskedLM', 'MultimodalAutoencoding', 'MultipleChoice', 'NextSentencePrediction', 'ObjectDetection', 'OpenQA', 'OpticalFlow', 'PreTraining', 'QuestionAnswering', 'QuestionAnsweringSimple', 'RegionToPhraseAlignment', 'SemanticSegmentation', 'SequenceClassification', 'Teacher', 'TokenClassification', 'VisualReasoning', 'XVector', 'merLayer', 'merModel', 'merPreTrainedModel']
['CausalLM', 'ConditionalGeneration', 'QuestionAnswering', 'SequenceClassification']
… the architecture
# show_doc(BlurrText(BlurrText).get_architectures)
print(NLP.get_architectures())
['albert', 'bart', 'barthez', 'bartpho', 'beit', 'bert', 'bert_generation', 'bert_japanese', 'bertweet', 'big_bird', 'bigbird_pegasus', 'blenderbot', 'blenderbot_small', 'bloom', 'byt5', 'camembert', 'canine', 'clip', 'codegen', 'convbert', 'convnext', 'cpm', 'ctrl', 'cvt', 'data2vec_audio', 'data2vec_text', 'data2vec_vision', 'deberta', 'deberta_v2', 'decision_transformer', 'deit', 'detr', 'distilbert', 'dpr', 'dpt', 'electra', 'encoder_decoder', 'flaubert', 'flava', 'fnet', 'fsmt', 'funnel', 'glpn', 'gpt2', 'gpt_neo', 'gpt_neox', 'gptj', 'groupvit', 'herbert', 'hubert', 'ibert', 'imagegpt', 'layoutlm', 'layoutlmv2', 'layoutlmv3', 'layoutxlm', 'led', 'levit', 'longformer', 'longt5', 'luke', 'lxmert', 'm2m_100', 'marian', 'maskformer', 'mbart', 'mbart50', 'mctct', 'megatron_bert', 'mluke', 'mmbt', 'mobilebert', 'mobilevit', 'mpnet', 'mt5', 'mvp', 'nezha', 'nllb', 'nystromformer', 'openai', 'opt', 'owlvit', 'pegasus', 'perceiver', 'phobert', 'plbart', 'poolformer', 'prophetnet', 'qdqbert', 'rag', 'realm', 'reformer', 'regnet', 'rembert', 'resnet', 'retribert', 'roberta', 'roformer', 'segformer', 'sew', 'sew_d', 'speech_encoder_decoder', 'speech_to_text', 'speech_to_text_2', 'splinter', 'squeezebert', 'swin', 't5', 'tapas', 'tapex', 'trajectory_transformer', 'transfo_xl', 'trocr', 'unispeech', 'unispeech_sat', 'van', 'vilt', 'vision_encoder_decoder', 'vision_text_dual_encoder', 'visual_bert', 'vit', 'vit_mae', 'wav2vec2', 'wav2vec2_conformer', 'wav2vec2_phoneme', 'wavlm', 'xglm', 'xlm', 'xlm_prophetnet', 'xlm_roberta', 'xlm_roberta_xl', 'xlnet', 'yolos', 'yoso']
# show_doc(BlurrText(BlurrText).get_model_architecture)
print(NLP.get_model_architecture("RobertaForSequenceClassification"))
roberta
… and lastly the models (optionally for a given task and/or architecture)
# show_doc(BlurrText(BlurrText).get_models)
print(L(NLP.get_models())[:5])
['AdaptiveEmbedding', 'AlbertForMaskedLM', 'AlbertForMultipleChoice', 'AlbertForPreTraining', 'AlbertForQuestionAnswering']
print(NLP.get_models(arch="bert")[:5])
['BertForMaskedLM', 'BertForMultipleChoice', 'BertForNextSentencePrediction', 'BertForPreTraining', 'BertForQuestionAnswering']
print(NLP.get_models(task="TokenClassification")[:5])
['AlbertForTokenClassification', 'BertForTokenClassification', 'BigBirdForTokenClassification', 'BloomForTokenClassification', 'CamembertForTokenClassification']
print(NLP.get_models(arch="bert", task="TokenClassification"))
['BertForTokenClassification']
To get all your Hugging Face objects (arch, config, tokenizer, and model)
How to use:
from transformers import AutoModelForMaskedLM
hf_logging.set_verbosity_error()
= get_hf_objects("bert-base-cased-finetuned-mrpc", model_cls=AutoModelForMaskedLM)
arch, config, tokenizer, model
print(arch)
print(type(config))
print(type(tokenizer))
print(type(model))
bert
<class 'transformers.models.bert.configuration_bert.BertConfig'>
<class 'transformers.models.bert.tokenization_bert_fast.BertTokenizerFast'>
<class 'transformers.models.bert.modeling_bert.BertForMaskedLM'>
Downloading config.json: 0%| | 0.00/433 [00:00<?, ?B/s]Downloading config.json: 100%|##########| 433/433 [00:00<00:00, 362kB/s]
Downloading tokenizer_config.json: 0%| | 0.00/29.0 [00:00<?, ?B/s]Downloading tokenizer_config.json: 100%|##########| 29.0/29.0 [00:00<00:00, 30.9kB/s]
Downloading vocab.txt: 0%| | 0.00/208k [00:00<?, ?B/s]Downloading vocab.txt: 36%|###5 | 75.0k/208k [00:00<00:00, 644kB/s]Downloading vocab.txt: 100%|##########| 208k/208k [00:00<00:00, 1.32MB/s]
Downloading tokenizer.json: 0%| | 0.00/426k [00:00<?, ?B/s]Downloading tokenizer.json: 20%|#9 | 84.0k/426k [00:00<00:00, 734kB/s]Downloading tokenizer.json: 100%|##########| 426k/426k [00:00<00:00, 2.19MB/s]
Downloading pytorch_model.bin: 0%| | 0.00/413M [00:00<?, ?B/s]Downloading pytorch_model.bin: 0%| | 1.57M/413M [00:00<00:26, 16.2MB/s]Downloading pytorch_model.bin: 1%|1 | 5.05M/413M [00:00<00:15, 28.0MB/s]Downloading pytorch_model.bin: 3%|2 | 10.4M/413M [00:00<00:10, 41.0MB/s]Downloading pytorch_model.bin: 5%|4 | 19.1M/413M [00:00<00:06, 60.5MB/s]Downloading pytorch_model.bin: 7%|6 | 28.9M/413M [00:00<00:05, 75.7MB/s]Downloading pytorch_model.bin: 9%|8 | 36.1M/413M [00:00<00:05, 70.7MB/s]Downloading pytorch_model.bin: 10%|# | 42.9M/413M [00:00<00:05, 68.8MB/s]Downloading pytorch_model.bin: 12%|#1 | 49.5M/413M [00:00<00:06, 57.6MB/s]Downloading pytorch_model.bin: 14%|#3 | 56.0M/413M [00:01<00:06, 58.4MB/s]Downloading pytorch_model.bin: 15%|#5 | 64.0M/413M [00:01<00:06, 60.7MB/s]Downloading pytorch_model.bin: 17%|#7 | 72.0M/413M [00:01<00:06, 55.4MB/s]Downloading pytorch_model.bin: 19%|#8 | 78.3M/413M [00:01<00:06, 55.9MB/s]Downloading pytorch_model.bin: 20%|## | 83.8M/413M [00:01<00:06, 49.6MB/s]Downloading pytorch_model.bin: 21%|##1 | 88.7M/413M [00:01<00:07, 42.6MB/s]Downloading pytorch_model.bin: 23%|##3 | 96.0M/413M [00:01<00:07, 43.8MB/s]Downloading pytorch_model.bin: 25%|##4 | 103M/413M [00:02<00:08, 38.5MB/s] Downloading pytorch_model.bin: 26%|##5 | 107M/413M [00:02<00:08, 37.0MB/s]Downloading pytorch_model.bin: 27%|##7 | 112M/413M [00:02<00:08, 38.7MB/s]Downloading pytorch_model.bin: 29%|##9 | 120M/413M [00:02<00:06, 44.6MB/s]Downloading pytorch_model.bin: 31%|### | 128M/413M [00:02<00:06, 49.4MB/s]Downloading pytorch_model.bin: 33%|###2 | 136M/413M [00:02<00:05, 54.9MB/s]Downloading pytorch_model.bin: 35%|###4 | 144M/413M [00:02<00:04, 58.9MB/s]Downloading pytorch_model.bin: 37%|###6 | 152M/413M [00:03<00:04, 59.1MB/s]Downloading pytorch_model.bin: 39%|###8 | 160M/413M [00:03<00:04, 59.0MB/s]Downloading pytorch_model.bin: 41%|#### | 168M/413M [00:03<00:04, 56.8MB/s]Downloading pytorch_model.bin: 42%|####2 | 175M/413M [00:03<00:04, 61.3MB/s]Downloading pytorch_model.bin: 44%|####3 | 181M/413M [00:03<00:04, 54.3MB/s]Downloading pytorch_model.bin: 45%|####5 | 187M/413M [00:03<00:05, 40.8MB/s]Downloading pytorch_model.bin: 46%|####6 | 192M/413M [00:04<00:05, 40.3MB/s]Downloading pytorch_model.bin: 48%|####8 | 200M/413M [00:04<00:05, 44.0MB/s]Downloading pytorch_model.bin: 50%|##### | 208M/413M [00:04<00:04, 52.4MB/s]Downloading pytorch_model.bin: 52%|#####1 | 214M/413M [00:04<00:04, 51.0MB/s]Downloading pytorch_model.bin: 53%|#####3 | 220M/413M [00:04<00:05, 39.9MB/s]Downloading pytorch_model.bin: 54%|#####4 | 224M/413M [00:04<00:05, 36.7MB/s]Downloading pytorch_model.bin: 55%|#####5 | 228M/413M [00:04<00:05, 36.4MB/s]Downloading pytorch_model.bin: 56%|#####6 | 232M/413M [00:05<00:05, 35.9MB/s]Downloading pytorch_model.bin: 58%|#####7 | 238M/413M [00:05<00:04, 40.8MB/s]Downloading pytorch_model.bin: 59%|#####8 | 242M/413M [00:05<00:04, 37.6MB/s]Downloading pytorch_model.bin: 60%|###### | 248M/413M [00:05<00:04, 35.1MB/s]Downloading pytorch_model.bin: 62%|######1 | 256M/413M [00:05<00:04, 38.6MB/s]Downloading pytorch_model.bin: 64%|######3 | 264M/413M [00:05<00:03, 44.7MB/s]Downloading pytorch_model.bin: 66%|######5 | 272M/413M [00:05<00:02, 51.7MB/s]Downloading pytorch_model.bin: 68%|######7 | 280M/413M [00:06<00:02, 56.2MB/s]Downloading pytorch_model.bin: 70%|######9 | 288M/413M [00:06<00:02, 50.4MB/s]Downloading pytorch_model.bin: 72%|#######1 | 296M/413M [00:06<00:02, 51.5MB/s]Downloading pytorch_model.bin: 73%|#######3 | 302M/413M [00:06<00:02, 52.0MB/s]Downloading pytorch_model.bin: 74%|#######4 | 307M/413M [00:06<00:02, 49.0MB/s]Downloading pytorch_model.bin: 76%|#######5 | 314M/413M [00:06<00:01, 53.8MB/s]Downloading pytorch_model.bin: 77%|#######7 | 319M/413M [00:06<00:01, 49.4MB/s]Downloading pytorch_model.bin: 78%|#######8 | 324M/413M [00:07<00:02, 39.0MB/s]Downloading pytorch_model.bin: 79%|#######9 | 328M/413M [00:07<00:02, 37.0MB/s]Downloading pytorch_model.bin: 81%|########1 | 336M/413M [00:07<00:02, 39.8MB/s]Downloading pytorch_model.bin: 83%|########2 | 342M/413M [00:07<00:01, 45.4MB/s]Downloading pytorch_model.bin: 84%|########3 | 347M/413M [00:07<00:01, 43.4MB/s]Downloading pytorch_model.bin: 85%|########5 | 352M/413M [00:07<00:01, 46.1MB/s]Downloading pytorch_model.bin: 87%|########6 | 359M/413M [00:07<00:01, 53.0MB/s]Downloading pytorch_model.bin: 88%|########8 | 364M/413M [00:07<00:01, 49.3MB/s]Downloading pytorch_model.bin: 90%|########9 | 370M/413M [00:08<00:00, 51.4MB/s]Downloading pytorch_model.bin: 91%|######### | 376M/413M [00:08<00:00, 48.3MB/s]Downloading pytorch_model.bin: 93%|#########2| 384M/413M [00:08<00:00, 43.6MB/s]Downloading pytorch_model.bin: 94%|#########3| 388M/413M [00:08<00:00, 43.5MB/s]Downloading pytorch_model.bin: 95%|#########5| 393M/413M [00:08<00:00, 40.4MB/s]Downloading pytorch_model.bin: 96%|#########6| 399M/413M [00:08<00:00, 40.7MB/s]Downloading pytorch_model.bin: 97%|#########7| 403M/413M [00:08<00:00, 38.1MB/s]Downloading pytorch_model.bin: 99%|#########9| 411M/413M [00:09<00:00, 49.6MB/s]Downloading pytorch_model.bin: 100%|##########| 413M/413M [00:09<00:00, 47.5MB/s]
from transformers import AutoModelForQuestionAnswering
hf_logging.set_verbosity_error()
= get_hf_objects("fmikaelian/flaubert-base-uncased-squad", model_cls=AutoModelForQuestionAnswering)
arch, config, tokenizer, model
print(arch)
print(type(config))
print(type(tokenizer))
print(type(model))
flaubert
<class 'transformers.models.flaubert.configuration_flaubert.FlaubertConfig'>
<class 'transformers.models.flaubert.tokenization_flaubert.FlaubertTokenizer'>
<class 'transformers.models.flaubert.modeling_flaubert.FlaubertForQuestionAnsweringSimple'>
Downloading config.json: 0%| | 0.00/1.49k [00:00<?, ?B/s]Downloading config.json: 100%|##########| 1.49k/1.49k [00:00<00:00, 1.58MB/s]
Downloading tokenizer_config.json: 0%| | 0.00/366 [00:00<?, ?B/s]Downloading tokenizer_config.json: 100%|##########| 366/366 [00:00<00:00, 321kB/s]
Downloading vocab.json: 0%| | 0.00/1.26M [00:00<?, ?B/s]Downloading vocab.json: 7%|6 | 84.0k/1.26M [00:00<00:01, 732kB/s]Downloading vocab.json: 46%|####5 | 588k/1.26M [00:00<00:00, 2.87MB/s]Downloading vocab.json: 100%|##########| 1.26M/1.26M [00:00<00:00, 4.74MB/s]
Downloading merges.txt: 0%| | 0.00/584k [00:00<?, ?B/s]Downloading merges.txt: 14%|#4 | 84.0k/584k [00:00<00:00, 738kB/s]Downloading merges.txt: 94%|#########3| 548k/584k [00:00<00:00, 2.68MB/s]Downloading merges.txt: 100%|##########| 584k/584k [00:00<00:00, 2.53MB/s]
Downloading special_tokens_map.json: 0%| | 0.00/305 [00:00<?, ?B/s]Downloading special_tokens_map.json: 100%|##########| 305/305 [00:00<00:00, 259kB/s]
Downloading pytorch_model.bin: 0%| | 0.00/533M [00:00<?, ?B/s]Downloading pytorch_model.bin: 1%| | 2.83M/533M [00:00<00:18, 29.7MB/s]Downloading pytorch_model.bin: 2%|2 | 11.4M/533M [00:00<00:08, 64.9MB/s]Downloading pytorch_model.bin: 4%|3 | 20.2M/533M [00:00<00:06, 77.3MB/s]Downloading pytorch_model.bin: 5%|5 | 29.0M/533M [00:00<00:06, 83.6MB/s]Downloading pytorch_model.bin: 7%|7 | 38.0M/533M [00:00<00:05, 87.2MB/s]Downloading pytorch_model.bin: 9%|8 | 46.8M/533M [00:00<00:05, 89.0MB/s]Downloading pytorch_model.bin: 10%|# | 55.7M/533M [00:00<00:05, 90.4MB/s]Downloading pytorch_model.bin: 12%|#2 | 64.5M/533M [00:00<00:05, 91.0MB/s]Downloading pytorch_model.bin: 14%|#3 | 73.4M/533M [00:00<00:05, 91.8MB/s]Downloading pytorch_model.bin: 15%|#5 | 82.2M/533M [00:01<00:05, 91.9MB/s]Downloading pytorch_model.bin: 17%|#7 | 91.0M/533M [00:01<00:05, 92.1MB/s]Downloading pytorch_model.bin: 19%|#8 | 99.8M/533M [00:01<00:04, 92.2MB/s]Downloading pytorch_model.bin: 20%|## | 109M/533M [00:01<00:04, 92.1MB/s] Downloading pytorch_model.bin: 22%|##2 | 117M/533M [00:01<00:04, 92.1MB/s]Downloading pytorch_model.bin: 24%|##3 | 126M/533M [00:01<00:04, 92.2MB/s]Downloading pytorch_model.bin: 25%|##5 | 135M/533M [00:01<00:04, 92.5MB/s]Downloading pytorch_model.bin: 27%|##7 | 144M/533M [00:01<00:04, 92.8MB/s]Downloading pytorch_model.bin: 29%|##8 | 153M/533M [00:01<00:04, 92.9MB/s]Downloading pytorch_model.bin: 30%|### | 162M/533M [00:01<00:04, 92.9MB/s]Downloading pytorch_model.bin: 32%|###2 | 171M/533M [00:02<00:04, 93.0MB/s]Downloading pytorch_model.bin: 34%|###3 | 180M/533M [00:02<00:03, 93.3MB/s]Downloading pytorch_model.bin: 35%|###5 | 189M/533M [00:02<00:03, 93.3MB/s]Downloading pytorch_model.bin: 37%|###7 | 197M/533M [00:02<00:03, 93.2MB/s]Downloading pytorch_model.bin: 39%|###8 | 206M/533M [00:02<00:03, 92.9MB/s]Downloading pytorch_model.bin: 40%|#### | 215M/533M [00:02<00:03, 92.5MB/s]Downloading pytorch_model.bin: 42%|####2 | 224M/533M [00:02<00:03, 92.6MB/s]Downloading pytorch_model.bin: 44%|####3 | 233M/533M [00:02<00:03, 92.4MB/s]Downloading pytorch_model.bin: 45%|####5 | 242M/533M [00:02<00:03, 92.9MB/s]Downloading pytorch_model.bin: 47%|####7 | 251M/533M [00:02<00:03, 93.1MB/s]Downloading pytorch_model.bin: 49%|####8 | 260M/533M [00:03<00:03, 92.9MB/s]Downloading pytorch_model.bin: 50%|##### | 269M/533M [00:03<00:02, 92.6MB/s]Downloading pytorch_model.bin: 52%|#####2 | 277M/533M [00:03<00:02, 92.8MB/s]Downloading pytorch_model.bin: 54%|#####3 | 286M/533M [00:03<00:02, 93.2MB/s]Downloading pytorch_model.bin: 55%|#####5 | 295M/533M [00:03<00:02, 93.4MB/s]Downloading pytorch_model.bin: 57%|#####7 | 304M/533M [00:03<00:02, 93.4MB/s]Downloading pytorch_model.bin: 59%|#####8 | 313M/533M [00:03<00:02, 93.4MB/s]Downloading pytorch_model.bin: 60%|###### | 322M/533M [00:03<00:02, 93.0MB/s]Downloading pytorch_model.bin: 62%|######2 | 331M/533M [00:03<00:02, 93.1MB/s]Downloading pytorch_model.bin: 64%|######3 | 340M/533M [00:03<00:02, 92.7MB/s]Downloading pytorch_model.bin: 65%|######5 | 349M/533M [00:04<00:02, 93.5MB/s]Downloading pytorch_model.bin: 67%|######7 | 358M/533M [00:04<00:01, 92.3MB/s]Downloading pytorch_model.bin: 69%|######8 | 367M/533M [00:04<00:01, 91.8MB/s]Downloading pytorch_model.bin: 70%|####### | 376M/533M [00:04<00:01, 92.0MB/s]Downloading pytorch_model.bin: 72%|#######2 | 384M/533M [00:04<00:01, 92.1MB/s]Downloading pytorch_model.bin: 74%|#######3 | 393M/533M [00:04<00:01, 92.0MB/s]Downloading pytorch_model.bin: 75%|#######5 | 402M/533M [00:04<00:01, 92.3MB/s]Downloading pytorch_model.bin: 77%|#######7 | 411M/533M [00:04<00:02, 57.8MB/s]Downloading pytorch_model.bin: 79%|#######8 | 420M/533M [00:05<00:01, 65.0MB/s]Downloading pytorch_model.bin: 80%|######## | 428M/533M [00:05<00:01, 71.0MB/s]Downloading pytorch_model.bin: 82%|########1 | 437M/533M [00:05<00:01, 76.0MB/s]Downloading pytorch_model.bin: 84%|########3 | 446M/533M [00:05<00:01, 80.3MB/s]Downloading pytorch_model.bin: 85%|########5 | 455M/533M [00:05<00:00, 83.9MB/s]Downloading pytorch_model.bin: 87%|########6 | 464M/533M [00:05<00:00, 86.6MB/s]Downloading pytorch_model.bin: 89%|########8 | 472M/533M [00:05<00:00, 88.2MB/s]Downloading pytorch_model.bin: 90%|######### | 481M/533M [00:05<00:00, 89.6MB/s]Downloading pytorch_model.bin: 92%|#########1| 490M/533M [00:05<00:00, 90.4MB/s]Downloading pytorch_model.bin: 94%|#########3| 499M/533M [00:05<00:00, 91.3MB/s]Downloading pytorch_model.bin: 95%|#########5| 508M/533M [00:06<00:00, 91.9MB/s]Downloading pytorch_model.bin: 97%|#########6| 517M/533M [00:06<00:00, 92.1MB/s]Downloading pytorch_model.bin: 99%|#########8| 526M/533M [00:06<00:00, 92.2MB/s]Downloading pytorch_model.bin: 100%|##########| 533M/533M [00:06<00:00, 88.7MB/s]
from transformers import BertTokenizer, BertForNextSentencePrediction
hf_logging.set_verbosity_error()
= get_hf_objects(
arch, config, tokenizer, model "bert-base-cased-finetuned-mrpc", config=None, tokenizer_cls=BertTokenizer, model_cls=BertForNextSentencePrediction
)print(arch)
print(type(config))
print(type(tokenizer))
print(type(model))
bert
<class 'transformers.models.bert.configuration_bert.BertConfig'>
<class 'transformers.models.bert.tokenization_bert.BertTokenizer'>
<class 'transformers.models.bert.modeling_bert.BertForNextSentencePrediction'>