This module contains core custom models, loss functions, and a default layer group splitter for use in applying discriminiative learning rates to your huggingface models trained via fastai
torch.cuda.set_device(1)
print(f'Using GPU #{torch.cuda.current_device()}: {torch.cuda.get_device_name()}')
Using GPU #1: GeForce GTX 1080 Ti

Base splitter, model wrapper, and model callback

hf_splitter[source]

hf_splitter(m)

Splits the huggingface model based on various model architecture conventions

class HF_BaseModelWrapper[source]

HF_BaseModelWrapper(hf_model, output_hidden_states=False, output_attentions=False, hf_model_kwargs={}) :: Module

Same as nn.Module, but no need for subclasses to call super().__init__

Note that HF_BaseModelWrapper includes some nifty code for just passing in the things your model needs, as not all transformer architectures require/use the same information.

class HF_PreCalculatedLoss[source]

HF_PreCalculatedLoss()

If you want to let your huggingface model calculate the loss for you, make sure you include the labels argument in your inputs and use HF_PreCalculatedLoss as your loss function. Even though we don't really need a loss function per se, we have to provide a custom loss class/function for fastai to function properly (e.g. one with a decodes and activation methods). Why? Because these methods will get called in methods like show_results to get the actual predictions.

class HF_BaseModelCallback[source]

HF_BaseModelCallback(after_create=None, before_fit=None, before_epoch=None, before_train=None, before_batch=None, after_pred=None, after_loss=None, before_backward=None, after_backward=None, after_step=None, after_cancel_batch=None, after_batch=None, after_cancel_train=None, after_train=None, before_validate=None, after_cancel_validate=None, after_validate=None, after_cancel_epoch=None, after_epoch=None, after_cancel_fit=None, after_fit=None) :: Callback

Basic class handling tweaks of the training loop by changing a Learner in various events

We use a Callback for handling what is returned from the huggingface model. The return type is (ModelOutput)[https://huggingface.co/transformers/main_classes/output.html#transformers.file_utils.ModelOutput] which makes it easy to return all the goodies we asked for.

Note that your Learner's loss will be set for you only if the huggingface model returns one and you are using the HF_PreCalculatedLoss loss function.

Also note that anything else you asked the model to return (for example, last hidden state, etc..) will be available for you via the blurr_model_outputs property attached to your Learner. For example, assuming you are using BERT for a classification task ... if you have told your HF_BaseModelWrapper instance to return attentions, you'd be able to access them via learn.blurr_model_outputs['attentions'].

Sequence classification

Below demonstrates how to setup your blurr pipeline for a sequence classification task (e.g., a model that requires a single text input)

path = untar_data(URLs.IMDB_SAMPLE)
imdb_df = pd.read_csv(path/'texts.csv')
imdb_df.head()
label text is_valid
0 negative Un-bleeping-believable! Meg Ryan doesn't even look her usual pert lovable self in this, which normally makes me forgive her shallow ticky acting schtick. Hard to believe she was the producer on this dog. Plus Kevin Kline: what kind of suicide trip has his career been on? Whoosh... Banzai!!! Finally this was directed by the guy who did Big Chill? Must be a replay of Jonestown - hollywood style. Wooofff! False
1 positive This is a extremely well-made film. The acting, script and camera-work are all first-rate. The music is good, too, though it is mostly early in the film, when things are still relatively cheery. There are no really superstars in the cast, though several faces will be familiar. The entire cast does an excellent job with the script.<br /><br />But it is hard to watch, because there is no good end to a situation like the one presented. It is now fashionable to blame the British for setting Hindus and Muslims against each other, and then cruelly separating them into two countries. There is som... False
2 negative Every once in a long while a movie will come along that will be so awful that I feel compelled to warn people. If I labor all my days and I can save but one soul from watching this movie, how great will be my joy.<br /><br />Where to begin my discussion of pain. For starters, there was a musical montage every five minutes. There was no character development. Every character was a stereotype. We had swearing guy, fat guy who eats donuts, goofy foreign guy, etc. The script felt as if it were being written as the movie was being shot. The production value was so incredibly low that it felt li... False
3 positive Name just says it all. I watched this movie with my dad when it came out and having served in Korea he had great admiration for the man. The disappointing thing about this film is that it only concentrate on a short period of the man's life - interestingly enough the man's entire life would have made such an epic bio-pic that it is staggering to imagine the cost for production.<br /><br />Some posters elude to the flawed characteristics about the man, which are cheap shots. The theme of the movie "Duty, Honor, Country" are not just mere words blathered from the lips of a high-brassed offic... False
4 negative This movie succeeds at being one of the most unique movies you've seen. However this comes from the fact that you can't make heads or tails of this mess. It almost seems as a series of challenges set up to determine whether or not you are willing to walk out of the movie and give up the money you just paid. If you don't want to feel slighted you'll sit through this horrible film and develop a real sense of pity for the actors involved, they've all seen better days, but then you realize they actually got paid quite a bit of money to do this and you'll lose pity for them just like you've alr... False
task = HF_TASKS_AUTO.SequenceClassification

pretrained_model_name = "roberta-base" # "distilbert-base-uncased" "bert-base-uncased"
hf_arch, hf_config, hf_tokenizer, hf_model = BLURR_MODEL_HELPER.get_hf_objects(pretrained_model_name, task=task)
blocks = (HF_TextBlock(hf_arch=hf_arch, hf_tokenizer=hf_tokenizer), CategoryBlock)

dblock = DataBlock(blocks=blocks, 
                   get_x=ColReader('text'), 
                   get_y=ColReader('label'), 
                   splitter=ColSplitter(col='is_valid'))
dls = dblock.dataloaders(imdb_df, bs=4)
dls.show_batch(dataloaders=dls, max_n=2)
text category
0 Raising Victor Vargas: A Review<br /><br />You know, Raising Victor Vargas is like sticking your hands into a big, steaming bowl of oatmeal. It's warm and gooey, but you're not sure if it feels right. Try as I might, no matter how warm and gooey Raising Victor Vargas became I was always aware that something didn't quite feel right. Victor Vargas suffers from a certain overconfidence on the director's part. Apparently, the director thought that the ethnic backdrop of a Latino family on the lower east side, and an idyllic storyline would make the film critic proof. He was right, but it didn't fool me. Raising Victor Vargas is the story about a seventeen-year old boy called, you guessed it, Victor Vargas (Victor Rasuk) who lives his teenage years chasing more skirt than the Rolling Stones could do in all the years they've toured. The movie starts off in `Ugly Fat' Donna's bedroom where Victor is sure to seduce her, but a cry from outside disrupts his plans when his best-friend Harold (Kevin Rivera) comes-a-looking for him. Caught in the attempt by Harold and his sister, Victor Vargas runs off for damage control. Yet even with the embarrassing implication that he's been boffing the homeliest girl in the neighborhood, nothing dissuades young Victor from going off on the hunt for more fresh meat. On a hot, New York City day they make way to the local public swimming pool where Victor's eyes catch a glimpse of the lovely young nymph Judy (Judy Marte), who's not just pretty, but a strong and independent too. The relationship that develops between Victor and Judy becomes the focus of the film. The story also focuses on Victor's family that is comprised of his grandmother or abuelita (Altagracia Guzman), his brother Nino (also played by real life brother to Victor, Silvestre Rasuk) and his sister Vicky (Krystal Rodriguez). The action follows Victor between scenes with Judy and scenes with his family. Victor tries to cope with being an oversexed pimp-daddy, his feelings for Judy and his grandmother's conservative Catholic upbringing.<br /><br />The problems that arise from Raising Victor Vargas are a few, but glaring errors. Throughout the film you get to know certain characters like Vicky, Nino, Grandma, negative
1 Many neglect that this isn't just a classic due to the fact that it's the first 3D game, or even the first shoot-'em-up. It's also one of the first stealth games, one of the only(and definitely the first) truly claustrophobic games, and just a pretty well-rounded gaming experience in general. With graphics that are terribly dated today, the game thrusts you into the role of B.J.(don't even *think* I'm going to attempt spelling his last name!), an American P.O.W. caught in an underground bunker. You fight and search your way through tunnels in order to achieve different objectives for the six episodes(but, let's face it, most of them are just an excuse to hand you a weapon, surround you with Nazis and send you out to waste one of the Nazi leaders). The graphics are, as I mentioned before, quite dated and very simple. The least detailed of basically any 3D game released by a professional team of creators. If you can get over that, however(and some would suggest that this simplicity only adds to the effect the game has on you), then you've got one heck of a good shooter/sneaking game. The game play consists of searching for keys, health and ammo, blasting enemies(aforementioned Nazis, and a "boss enemy" per chapter) of varying difficulty(which, of course, grows as you move further in the game), unlocking doors and looking for secret rooms. There is a bonus count after each level is beaten... it goes by how fast you were(basically, if you beat the 'par time', which is the time it took a tester to go through the same level; this can be quite fun to try and beat, and with how difficult the levels are to find your way in, they are even challenging after many play-throughs), how much Nazi gold(treasure) you collected and how many bad guys you killed. Basically, if you got 100% of any of aforementioned, you get a bonus, helping you reach the coveted high score placings. The game (mostly, but not always) allows for two contrastingly different methods of playing... stealthily or gunning down anything and everything you see. You can either run or walk, and amongst your weapons is also a knife... running is heard instantly the moment you enter the same room as the guard, as is gunshots. Many guards are found standing with their backs turned to you positive

Training

We'll also add in custom summary methods for blurr learners/models that work with dictionary inputs

model = HF_BaseModelWrapper(hf_model)

learn = Learner(dls, 
                model,
                opt_func=partial(Adam),
                loss_func=CrossEntropyLossFlat(),
                metrics=[accuracy],
                cbs=[HF_BaseModelCallback],
                splitter=hf_splitter)

learn.create_opt()             # -> will create your layer groups based on your "splitter" function
learn.freeze()

.to_fp16() requires a GPU so had to remove for tests to run on github. Let's check that we can get predictions.

b = dls.one_batch()
learn.model(b[0])
SequenceClassifierOutput(loss=None, logits=tensor([[0.0150, 0.1525],
        [0.0107, 0.1476],
        [0.0174, 0.1535],
        [0.0201, 0.1535]], device='cuda:1', grad_fn=<AddmmBackward>), hidden_states=None, attentions=None)

blurr_module_summary[source]

blurr_module_summary(learn, *xb)

Print a summary of model using xb

Learner.blurr_summary[source]

Learner.blurr_summary()

Print a summary of the model, optimizer and loss function.

We have to create our own summary methods above because fastai only works where things are represented by a single tensor. But in the case of huggingface transformers, a single sequence is represented by multiple tensors (in a dictionary).

The change to make this work is so minor I think that the fastai library can/will hopefully be updated to support this use case.

 
print(len(learn.opt.param_groups))
3
learn.lr_find(suggestions=True)
SuggestedLRs(lr_min=7.585775847473997e-08, lr_steep=0.010964781977236271)
learn.fit_one_cycle(1, lr_max=1e-3)
epoch train_loss valid_loss accuracy time
0 0.359437 0.283850 0.895000 00:21

Showing results

And here we creat a @typedispatched impelmentation of Learner.show_results.

learn.show_results(learner=learn, max_n=2, trunc_at=500)
text category target
0 The trouble with the book, "Memoirs of a Geisha" is that it had Japanese surfaces but underneath the surfaces it was all an American man's way of thinking. Reading the book is like watching a magnificent ballet with great music, sets, and costumes yet performed by barnyard animals dressed in those costumes—so far from Japanese ways of thinking were the characters.<br /><br />The movie isn't about Japan or real geisha. It is a story about a few American men's mistaken ideas about Japan and geish negative negative
1 <br /><br />I'm sure things didn't exactly go the same way in the real life of Homer Hickam as they did in the film adaptation of his book, Rocket Boys, but the movie "October Sky" (an anagram of the book's title) is good enough to stand alone. I have not read Hickam's memoirs, but I am still able to enjoy and understand their film adaptation. The film, directed by Joe Johnston and written by Lewis Colick, records the story of teenager Homer Hickam (Jake Gyllenhaal), beginning in October of 195 positive positive

Learner.blurr_predict[source]

Learner.blurr_predict(item, rm_type_tfms=None, with_input=False)

Same as with summary, we need to replace fastai's Learner.predict method with the one above which is able to work with inputs that are represented by multiple tensors included in a dictionary.

learn.blurr_predict('I really liked the movie')
('positive', tensor(1), tensor([0.2876, 0.7124]))
learn.unfreeze()
learn.fit_one_cycle(3, lr_max=slice(1e-7, 1e-4))
epoch train_loss valid_loss accuracy time
0 0.282943 0.239314 0.920000 00:34
1 0.189503 0.268762 0.905000 00:34
2 0.131114 0.293697 0.905000 00:34
learn.recorder.plot_loss()
learn.show_results(learner=learn, max_n=2, trunc_at=500)
text category target
0 The trouble with the book, "Memoirs of a Geisha" is that it had Japanese surfaces but underneath the surfaces it was all an American man's way of thinking. Reading the book is like watching a magnificent ballet with great music, sets, and costumes yet performed by barnyard animals dressed in those costumes—so far from Japanese ways of thinking were the characters.<br /><br />The movie isn't about Japan or real geisha. It is a story about a few American men's mistaken ideas about Japan and geish negative negative
1 <br /><br />I'm sure things didn't exactly go the same way in the real life of Homer Hickam as they did in the film adaptation of his book, Rocket Boys, but the movie "October Sky" (an anagram of the book's title) is good enough to stand alone. I have not read Hickam's memoirs, but I am still able to enjoy and understand their film adaptation. The film, directed by Joe Johnston and written by Lewis Colick, records the story of teenager Homer Hickam (Jake Gyllenhaal), beginning in October of 195 positive positive
learn.blurr_predict("This was a really good movie")
('positive', tensor(1), tensor([0.0857, 0.9143]))
learn.blurr_predict("Acting was so bad it was almost funny.")
('negative', tensor(0), tensor([0.8838, 0.1162]))

Inference

learn.export(fname='seq_class_learn_export.pkl')
inf_learn = load_learner(fname='seq_class_learn_export.pkl')
inf_learn.blurr_predict("This movie should not be seen by anyone!!!!")
('negative', tensor(0), tensor([0.9020, 0.0980]))

Tests

The tests below to ensure the core training code above works for all pretrained sequence classification models available in huggingface. These tests are excluded from the CI workflow because of how long they would take to run and the amount of data that would be required to download.

Note: Feel free to modify the code below to test whatever pretrained classification models you are working with ... and if any of your pretrained sequence classification models fail, please submit a github issue (or a PR if you'd like to fix it yourself)

try: del learn; torch.cuda.empty_cache()
except: pass
BLURR_MODEL_HELPER.get_models(task='SequenceClassification')
[transformers.modeling_albert.AlbertForSequenceClassification,
 transformers.modeling_auto.AutoModelForSequenceClassification,
 transformers.modeling_bart.BartForSequenceClassification,
 transformers.modeling_bert.BertForSequenceClassification,
 transformers.modeling_camembert.CamembertForSequenceClassification,
 transformers.modeling_deberta.DebertaForSequenceClassification,
 transformers.modeling_distilbert.DistilBertForSequenceClassification,
 transformers.modeling_electra.ElectraForSequenceClassification,
 transformers.modeling_flaubert.FlaubertForSequenceClassification,
 transformers.modeling_funnel.FunnelForSequenceClassification,
 transformers.modeling_gpt2.GPT2ForSequenceClassification,
 transformers.modeling_longformer.LongformerForSequenceClassification,
 transformers.modeling_mobilebert.MobileBertForSequenceClassification,
 transformers.modeling_openai.OpenAIGPTForSequenceClassification,
 transformers.modeling_reformer.ReformerForSequenceClassification,
 transformers.modeling_roberta.RobertaForSequenceClassification,
 transformers.modeling_squeezebert.SqueezeBertForSequenceClassification,
 transformers.modeling_xlm.XLMForSequenceClassification,
 transformers.modeling_xlm_roberta.XLMRobertaForSequenceClassification,
 transformers.modeling_xlnet.XLNetForSequenceClassification]
pretrained_model_names = [
    'albert-base-v1',
    'facebook/bart-base',
    'bert-base-uncased',
    'camembert-base',
    'distilbert-base-uncased',
    'monologg/electra-small-finetuned-imdb',
    'flaubert/flaubert_small_cased', 
    'allenai/longformer-base-4096',
    'google/mobilebert-uncased',
    'roberta-base',
    'xlm-mlm-en-2048',
    'xlm-roberta-base',
    'xlnet-base-cased'
]
path = untar_data(URLs.IMDB_SAMPLE)

model_path = Path('models')
imdb_df = pd.read_csv(path/'texts.csv')
#hide_output
task = HF_TASKS_AUTO.SequenceClassification
bsz = 2
seq_sz = 128

test_results = []
for model_name in pretrained_model_names:
    error=None
    
    print(f'=== {model_name} ===\n')
    
    hf_arch, hf_config, hf_tokenizer, hf_model = BLURR_MODEL_HELPER.get_hf_objects(model_name, 
                                                                                   task=task, 
                                                                                   config_kwargs={'num_labels': 2})
    
    print(f'architecture:\t{hf_arch}\ntokenizer:\t{type(hf_tokenizer).__name__}\nmodel:\t\t{type(hf_model).__name__}\n')

    blocks = (HF_TextBlock(hf_arch=hf_arch, hf_tokenizer=hf_tokenizer, max_length=seq_sz, padding='max_length'), 
              CategoryBlock)

    dblock = DataBlock(blocks=blocks, 
                       get_x=ColReader('text'), 
                       get_y=ColReader('label'), 
                       splitter=ColSplitter(col='is_valid'))
    
    dls = dblock.dataloaders(imdb_df, bs=bsz)
    
    model = HF_BaseModelWrapper(hf_model)
    learn = Learner(dls, 
                    model,
                    opt_func=partial(Adam),
                    loss_func=CrossEntropyLossFlat(),
                    metrics=[accuracy],
                    cbs=[HF_BaseModelCallback],
                    splitter=hf_splitter)

    learn.create_opt()             # -> will create your layer groups based on your "splitter" function
    learn.freeze()
    
    b = dls.one_batch()
    
    try:
        print('*** TESTING DataLoaders ***')
        test_eq(len(b), bsz)
        test_eq(len(b[0]['input_ids']), bsz)
        test_eq(b[0]['input_ids'].shape, torch.Size([bsz, seq_sz]))
        test_eq(len(b[1]), bsz)

        print('*** TESTING One pass through the model ***')
        preds = learn.model(b[0])
        test_eq(len(preds[0]), bsz)
        test_eq(preds[0].shape, torch.Size([bsz, 2]))

        print('*** TESTING Training/Results ***')
        learn.fit_one_cycle(1, lr_max=1e-3)

        test_results.append((hf_arch, type(hf_tokenizer).__name__, type(hf_model).__name__, 'PASSED', ''))
        learn.show_results(learner=learn, max_n=2, trunc_at=250)
    except Exception as err:
        test_results.append((hf_arch, type(hf_tokenizer).__name__, type(hf_model).__name__, 'FAILED', err))
    finally:
        # cleanup
        del learn; torch.cuda.empty_cache()
arch tokenizer model result error
0 albert AlbertTokenizer AlbertForSequenceClassification PASSED
1 bart BartTokenizer BartForSequenceClassification PASSED
2 bert BertTokenizer BertForSequenceClassification PASSED
3 camembert CamembertTokenizer CamembertForSequenceClassification PASSED
4 distilbert DistilBertTokenizer DistilBertForSequenceClassification PASSED
5 electra ElectraTokenizer ElectraForSequenceClassification PASSED
6 flaubert FlaubertTokenizer FlaubertForSequenceClassification PASSED
7 longformer LongformerTokenizer LongformerForSequenceClassification PASSED
8 mobilebert MobileBertTokenizer MobileBertForSequenceClassification PASSED
9 roberta RobertaTokenizer RobertaForSequenceClassification PASSED
10 xlm XLMTokenizer XLMForSequenceClassification PASSED
11 xlm_roberta XLMRobertaTokenizer XLMRobertaForSequenceClassification PASSED
12 xlnet XLNetTokenizer XLNetForSequenceClassification PASSED

Cleanup