Various utility functions used by the blurr package.

Utility classes and methods

The Singleton class and str_to_type methods are used in the construction of the BLURR instance, as well as, in other parts of the library. print_versions is a nicety added for developers wishing to know what versions of specific libraries are being used for documentation and troubleshooting.

class Singleton[source]

Singleton()

Singleton functions as python decorator. Use this above any class to turn that class into a singleton (see here for more info on the singleton pattern).

@Singleton
class TestSingleton: pass

a = TestSingleton()
b = TestSingleton()
test_eq(a,b)

str_to_type[source]

str_to_type(typename:str)

Converts a type represented as a string to the actual class

Parameters:

  • typename : <class 'str'>

    The name of a type as a string

Returns:

  • typing.Type

    Returns the actual type

How to use:

print(str_to_type('List'))
print(str_to_type('test_eq'))
print(str_to_type('TestSingleton'))
typing.List
<function test_eq at 0x7fa845c60280>
<__main__.Singleton object at 0x7fa84568e250>

print_versions(packages:Union[str, List[str]])

Prints the name and version of one or more packages in your environment

Parameters:

  • packages : typing.Union[str, typing.List[str]]

    A string of space delimited package names or a list of package names

How to use:

print_versions('torch transformers fastai')
print('---')
print_versions(['torch', 'transformers', 'fastai'])
torch: 1.7.1
transformers: 4.9.2
fastai: 2.5.0
---
torch: 1.7.1
transformers: 4.9.2
fastai: 2.5.0

BlurrUtil

Singleton object at 0x7fae84314d60>[source]

Singleton object at 0x7fae84314d60>(*args, **kwargs)

BlurrUtil is a Singleton (there exists only one instance, and the same instance is returned upon subsequent instantiation requests). You can get at via the BLURR constant below.

mh = BlurrUtil()
mh2 = BlurrUtil()
test_eq(mh, mh2)

Provide global helper constant

Users of this library can simply use BLURR to access all the BlurrUtil capabilities without having to fetch an instance themselves.

Here's how you can get at the core Hugging Face objects you need to work with ...

... the task

BlurrUtil.get_tasks[source]

BlurrUtil.get_tasks(arch:str=None)

This method can be used to get a list of all tasks supported by your transformers install, or just those available to a specific architecture

Parameters:

  • arch : <class 'str'>, optional

    A transformer architecture (e.g., 'bert')

print(BLURR.get_tasks())
print('')
print(BLURR.get_tasks('bart'))
['CTC', 'CausalLM', 'Classification', 'ConditionalGeneration', 'EntityClassification', 'EntityPairClassification', 'EntitySpanClassification', 'Generation', 'ImageClassification', 'LMHead', 'LMHeadModel', 'MaskedLM', 'MultipleChoice', 'NextSentencePrediction', 'PreTraining', 'QuestionAnswering', 'QuestionAnsweringSimple', 'RegionToPhraseAlignment', 'SequenceClassification', 'Teacher', 'TokenClassification', 'VisualReasoning', 'merLayer', 'merModel', 'merPreTrainedModel']

['CausalLM', 'ConditionalGeneration', 'QuestionAnswering', 'SequenceClassification']

... the architecture

BlurrUtil.get_architectures[source]

BlurrUtil.get_architectures()

print(BLURR.get_architectures())
['albert', 'albert_fast', 'bart', 'bart_fast', 'barthez', 'barthez_fast', 'bert', 'bert_fast', 'bert_generation', 'bert_japanese', 'bertweet', 'big_bird', 'big_bird_fast', 'bigbird_pegasus', 'blenderbot', 'blenderbot_small', 'byt5', 'camembert', 'camembert_fast', 'canine', 'clip', 'clip_fast', 'convbert', 'convbert_fast', 'cpm', 'ctrl', 'deberta', 'deberta_fast', 'deberta_v2', 'deit', 'detr', 'distilbert', 'distilbert_fast', 'dpr', 'dpr_fast', 'electra', 'electra_fast', 'encoder_decoder', 'flaubert', 'fsmt', 'funnel', 'funnel_fast', 'gpt2', 'gpt2_fast', 'gpt_neo', 'herbert', 'herbert_fast', 'hubert', 'ibert', 'layoutlm', 'layoutlm_fast', 'led', 'led_fast', 'longformer', 'longformer_fast', 'luke', 'lxmert', 'lxmert_fast', 'm2m_100', 'marian', 'mbart', 'mbart50', 'mbart50_fast', 'mbart_fast', 'megatron_bert', 'mmbt', 'mobilebert', 'mobilebert_fast', 'mpnet', 'mpnet_fast', 'mt5', 'openai', 'openai_fast', 'pegasus', 'pegasus_fast', 'phobert', 'prophetnet', 'rag', 'reformer', 'reformer_fast', 'retribert', 'retribert_fast', 'roberta', 'roberta_fast', 'roformer', 'roformer_fast', 'speech_to_text', 'squeezebert', 'squeezebert_fast', 't5', 't5_fast', 'tapas', 'transfo_xl', 'visual_bert', 'vit', 'wav2vec2', 'xlm', 'xlm_prophetnet', 'xlm_roberta', 'xlm_roberta_fast', 'xlnet', 'xlnet_fast']

BlurrUtil.get_model_architecture[source]

BlurrUtil.get_model_architecture(model_name_or_enum)

Get the architecture for a given model name / enum

Parameters:

  • model_name_or_enum : <class 'inspect._empty'>
print(BLURR.get_model_architecture('RobertaForSequenceClassification'))
roberta

... and lastly the models (optionally for a given task and/or architecture)

BlurrUtil.get_models[source]

BlurrUtil.get_models(arch:str=None, task:str=None)

The transformer models available for use (optional: by architecture | task)

Parameters:

  • arch : <class 'str'>, optional

    A transformer architecture (e.g., 'bert')

  • task : <class 'str'>, optional

    A transformer task (e.g., 'TokenClassification')

print(L(BLURR.get_models())[:5])
['AdaptiveEmbedding', 'AlbertForMaskedLM', 'AlbertForMultipleChoice', 'AlbertForPreTraining', 'AlbertForQuestionAnswering']
print(BLURR.get_models(arch='bert')[:5])
['BertForMaskedLM', 'BertForMultipleChoice', 'BertForNextSentencePrediction', 'BertForPreTraining', 'BertForQuestionAnswering']
print(BLURR.get_models(task='TokenClassification')[:5])
['AlbertForTokenClassification', 'BertForTokenClassification', 'BigBirdForTokenClassification', 'CamembertForTokenClassification', 'CanineForTokenClassification']
print(BLURR.get_models(arch='bert', task='TokenClassification'))
['BertForTokenClassification']

Here we define some helpful enums to make it easier to get at the task and architecture you're looking for.

print('--- all tasks ---')
print(L(HF_TASKS))
--- all tasks ---
[<HF_TASKS_ALL.CTC: 1>, <HF_TASKS_ALL.CausalLM: 2>, <HF_TASKS_ALL.Classification: 3>, <HF_TASKS_ALL.ConditionalGeneration: 4>, <HF_TASKS_ALL.EntityClassification: 5>, <HF_TASKS_ALL.EntityPairClassification: 6>, <HF_TASKS_ALL.EntitySpanClassification: 7>, <HF_TASKS_ALL.Generation: 8>, <HF_TASKS_ALL.ImageClassification: 9>, <HF_TASKS_ALL.LMHead: 10>, <HF_TASKS_ALL.LMHeadModel: 11>, <HF_TASKS_ALL.MaskedLM: 12>, <HF_TASKS_ALL.MultipleChoice: 13>, <HF_TASKS_ALL.NextSentencePrediction: 14>, <HF_TASKS_ALL.PreTraining: 15>, <HF_TASKS_ALL.QuestionAnswering: 16>, <HF_TASKS_ALL.QuestionAnsweringSimple: 17>, <HF_TASKS_ALL.RegionToPhraseAlignment: 18>, <HF_TASKS_ALL.SequenceClassification: 19>, <HF_TASKS_ALL.Teacher: 20>, <HF_TASKS_ALL.TokenClassification: 21>, <HF_TASKS_ALL.VisualReasoning: 22>, <HF_TASKS_ALL.merLayer: 23>, <HF_TASKS_ALL.merModel: 24>, <HF_TASKS_ALL.merPreTrainedModel: 25>]
HF_TASKS.Classification
<HF_TASKS_ALL.Classification: 3>
print(L(HF_ARCHITECTURES)[:5])
[<HF_ARCHITECTURES.albert: 1>, <HF_ARCHITECTURES.albert_fast: 2>, <HF_ARCHITECTURES.bart: 3>, <HF_ARCHITECTURES.bart_fast: 4>, <HF_ARCHITECTURES.barthez: 5>]

To get all your Hugging Face objects (arch, config, tokenizer, and model)

BlurrUtil.get_hf_objects[source]

BlurrUtil.get_hf_objects(pretrained_model_name_or_path:Union[str, PathLike, NoneType], model_cls:PreTrainedModel, config:Union[PretrainedConfig, str, PathLike]=None, tokenizer_cls:PreTrainedTokenizerBase=None, config_kwargs={}, tokenizer_kwargs={}, model_kwargs={}, cache_dir:Union[str, PathLike]=None)

Given at minimum a pretrained_model_name_or_path and model_cls (such asAutoModelForSequenceClassification"), this method returns all the Hugging Face objects you need to train a model using Blurr

Parameters:

  • pretrained_model_name_or_path : typing.Union[str, os.PathLike, NoneType]

    The name or path of the pretrained model you want to fine-tune

  • model_cls : <class 'transformers.modeling_utils.PreTrainedModel'>

    The model class you want to use (e.g., AutoModelFor)

  • config : typing.Union[transformers.configuration_utils.PretrainedConfig, str, os.PathLike], optional

    A specific configuration instance you want to use. If None, a configuration object will be instantiated using the AutoConfig class along with any supplied `config_kwargs`

  • tokenizer_cls : <class 'transformers.tokenization_utils_base.PreTrainedTokenizerBase'>, optional

    A specific tokenizer class you want to use. If None, a tokenizer will be instantiated using the AutoTokenizer class along with any supplied `tokenizer_kwargs`

  • config_kwargs : <class 'dict'>, optional

    Any keyword arguments you want to pass to the `AutoConfig` (only used if you do NOT pass int a config above)

  • tokenizer_kwargs : <class 'dict'>, optional

    Any keyword arguments you want to pass in the creation of your tokenizer

  • model_kwargs : <class 'dict'>, optional

    Any keyword arguments you want to pass in the creation of your model

  • cache_dir : typing.Union[str, os.PathLike], optional

    If you want to change the location Hugging Face objects are cached

Returns:

  • (<class 'str'>, <class 'transformers.configuration_utils.PretrainedConfig'>, <class 'transformers.tokenization_utils_base.PreTrainedTokenizerBase'>, <class 'transformers.modeling_utils.PreTrainedModel'>)

    A tuple containg the (architecture (str), config (obj), tokenizer (obj), and model (obj)

How to use:

from transformers import AutoModelForMaskedLM
arch, config, tokenizer, model = BLURR.get_hf_objects("bert-base-cased-finetuned-mrpc",
                                                      model_cls=AutoModelForMaskedLM)

print(arch)
print(type(config))
print(type(tokenizer))
print(type(model))
bert
<class 'transformers.models.bert.configuration_bert.BertConfig'>
<class 'transformers.models.bert.tokenization_bert_fast.BertTokenizerFast'>
<class 'transformers.models.bert.modeling_bert.BertForMaskedLM'>
from transformers import AutoModelForQuestionAnswering
arch, tokenizer, config, model = BLURR.get_hf_objects("fmikaelian/flaubert-base-uncased-squad",
                                                      model_cls=AutoModelForQuestionAnswering)

print(arch)
print(type(config))
print(type(tokenizer))
print(type(model))
flaubert
<class 'transformers.models.flaubert.tokenization_flaubert.FlaubertTokenizer'>
<class 'transformers.models.flaubert.configuration_flaubert.FlaubertConfig'>
<class 'transformers.models.flaubert.modeling_flaubert.FlaubertForQuestionAnsweringSimple'>
from transformers import BertTokenizer, BertForNextSentencePrediction
arch, tokenizer, config, model = BLURR.get_hf_objects("bert-base-cased-finetuned-mrpc",
                                                      config=None,
                                                      tokenizer_cls=BertTokenizer, 
                                                      model_cls=BertForNextSentencePrediction)
print(arch)
print(type(config))
print(type(tokenizer)) 
print(type(model))
bert
<class 'transformers.models.bert.tokenization_bert.BertTokenizer'>
<class 'transformers.models.bert.configuration_bert.BertConfig'>
<class 'transformers.models.bert.modeling_bert.BertForNextSentencePrediction'>

Summary

Using the BLURR object is optional, as you're free to build your Hugging Face objects manually which sometimes may be required given your use case. Most of the time, however, BLURR is sufficent to get you everything you need in one-line with a pretrained model name or path and task specific transformer model class.