This is an example of how to use blurr for multilabel classification tasks
torch.cuda.set_device(1)
print(f'Using GPU #{torch.cuda.current_device()}: {torch.cuda.get_device_name()}')
Using GPU #1: GeForce GTX 1080 Ti

Let's start by building our DataBlock

raw_data = datasets.load_dataset('civil_comments', split='train[:1%]') 
len(raw_data)
Using custom data configuration default
Reusing dataset civil_comments (/home/wgilliam/.cache/huggingface/datasets/civil_comments/default/0.9.0/98bdc73fc77a117cf5d17c9977e278c8023c64177a3ed9e0c49f7a5bdf10a47b)
18049
toxic_df = pd.DataFrame(raw_data, columns=list(raw_data.features.keys()))
toxic_df.head()
text toxicity severe_toxicity obscene threat insult identity_attack sexual_explicit
0 This is so cool. It's like, 'would you want your mother to read this??' Really great idea, well done! 0.000000 0.000000 0.0 0.0 0.00000 0.000000 0.0
1 Thank you!! This would make my life a lot less anxiety-inducing. Keep it up, and don't let anyone get in your way! 0.000000 0.000000 0.0 0.0 0.00000 0.000000 0.0
2 This is such an urgent design problem; kudos to you for taking it on. Very impressive! 0.000000 0.000000 0.0 0.0 0.00000 0.000000 0.0
3 Is this something I'll be able to install on my site? When will you be releasing it? 0.000000 0.000000 0.0 0.0 0.00000 0.000000 0.0
4 haha you guys are a bunch of losers. 0.893617 0.021277 0.0 0.0 0.87234 0.021277 0.0
lbl_cols = list(toxic_df.columns[2:]); lbl_cols
['severe_toxicity',
 'obscene',
 'threat',
 'insult',
 'identity_attack',
 'sexual_explicit']
toxic_df = toxic_df.round({col: 0 for col in lbl_cols})
toxic_df = toxic_df.convert_dtypes()

toxic_df.head()
text toxicity severe_toxicity obscene threat insult identity_attack sexual_explicit
0 This is so cool. It's like, 'would you want your mother to read this??' Really great idea, well done! 0.000000 0 0 0 0 0 0
1 Thank you!! This would make my life a lot less anxiety-inducing. Keep it up, and don't let anyone get in your way! 0.000000 0 0 0 0 0 0
2 This is such an urgent design problem; kudos to you for taking it on. Very impressive! 0.000000 0 0 0 0 0 0
3 Is this something I'll be able to install on my site? When will you be releasing it? 0.000000 0 0 0 0 0 0
4 haha you guys are a bunch of losers. 0.893617 0 0 0 1 0 0

For our huggingface model, let's used the distilled version of RoBERTa. This should allow us to train the model on bigger mini-batches without much performance loss. Even on my 1080Ti, I should be able to train all the parameters (which isn't possible with the roberta-base model)

task = HF_TASKS_ALL.SequenceClassification

pretrained_model_name = "distilroberta-base"
config = AutoConfig.from_pretrained(pretrained_model_name)
config.num_labels = len(lbl_cols)

hf_arch, hf_config, hf_tokenizer, hf_model = BLURR_MODEL_HELPER.get_hf_objects(pretrained_model_name, 
                                                                               task=task, 
                                                                               config=config)

print(hf_arch)
print(type(hf_config))
print(type(hf_tokenizer))
print(type(hf_model))



roberta
<class 'transformers.configuration_roberta.RobertaConfig'>
<class 'transformers.tokenization_roberta.RobertaTokenizer'>
<class 'transformers.modeling_roberta.RobertaForSequenceClassification'>

Note how we have to configure the num_labels to the number of labels we are predicting. Given that our labels are already encoded, we use a MultiCategoryBlock with encoded=True and vocab equal to the columns with our 1's and 0's.

blocks = (
    HF_TextBlock(hf_arch=hf_arch, hf_tokenizer=hf_tokenizer), 
    MultiCategoryBlock(encoded=True, vocab=lbl_cols)
)

dblock = DataBlock(blocks=blocks, 
                   get_x=ColReader('text'), get_y=ColReader(lbl_cols), 
                   splitter=RandomSplitter())
dls = dblock.dataloaders(toxic_df, bs=16)
b = dls.one_batch()
len(b), b[0]['input_ids'].shape, b[1].shape
(2, torch.Size([16, 391]), torch.Size([16, 6]))

With our DataLoaders built, we can now build our Learner and train. We'll use mixed precision so we can train with bigger batches

model = HF_BaseModelWrapper(hf_model)

learn = Learner(dls, 
                model,
                opt_func=partial(Adam),
                loss_func=BCEWithLogitsLossFlat(),
                metrics=[partial(accuracy_multi, thresh=0.2)],
                cbs=[HF_BaseModelCallback],
                splitter=hf_splitter).to_fp16()

learn.loss_func.thresh = 0.2
learn.create_opt()             # -> will create your layer groups based on your "splitter" function
learn.freeze()
learn.blurr_summary()
epoch train_loss valid_loss accuracy_multi time
0 None None 00:00
HF_BaseModelWrapper (Input shape: 16 x 404)
================================================================
Layer (type)         Output Shape         Param #    Trainable 
================================================================
Embedding            16 x 404 x 768       38,603,520 False     
________________________________________________________________
Embedding            16 x 404 x 768       394,752    False     
________________________________________________________________
Embedding            16 x 404 x 768       768        False     
________________________________________________________________
LayerNorm            16 x 404 x 768       1,536      True      
________________________________________________________________
Dropout              16 x 404 x 768       0          False     
________________________________________________________________
Linear               16 x 404 x 768       590,592    False     
________________________________________________________________
Linear               16 x 404 x 768       590,592    False     
________________________________________________________________
Linear               16 x 404 x 768       590,592    False     
________________________________________________________________
Dropout              16 x 12 x 404 x 404  0          False     
________________________________________________________________
Linear               16 x 404 x 768       590,592    False     
________________________________________________________________
LayerNorm            16 x 404 x 768       1,536      True      
________________________________________________________________
Dropout              16 x 404 x 768       0          False     
________________________________________________________________
Linear               16 x 404 x 3072      2,362,368  False     
________________________________________________________________
Linear               16 x 404 x 768       2,360,064  False     
________________________________________________________________
LayerNorm            16 x 404 x 768       1,536      True      
________________________________________________________________
Dropout              16 x 404 x 768       0          False     
________________________________________________________________
Linear               16 x 404 x 768       590,592    False     
________________________________________________________________
Linear               16 x 404 x 768       590,592    False     
________________________________________________________________
Linear               16 x 404 x 768       590,592    False     
________________________________________________________________
Dropout              16 x 12 x 404 x 404  0          False     
________________________________________________________________
Linear               16 x 404 x 768       590,592    False     
________________________________________________________________
LayerNorm            16 x 404 x 768       1,536      True      
________________________________________________________________
Dropout              16 x 404 x 768       0          False     
________________________________________________________________
Linear               16 x 404 x 3072      2,362,368  False     
________________________________________________________________
Linear               16 x 404 x 768       2,360,064  False     
________________________________________________________________
LayerNorm            16 x 404 x 768       1,536      True      
________________________________________________________________
Dropout              16 x 404 x 768       0          False     
________________________________________________________________
Linear               16 x 404 x 768       590,592    False     
________________________________________________________________
Linear               16 x 404 x 768       590,592    False     
________________________________________________________________
Linear               16 x 404 x 768       590,592    False     
________________________________________________________________
Dropout              16 x 12 x 404 x 404  0          False     
________________________________________________________________
Linear               16 x 404 x 768       590,592    False     
________________________________________________________________
LayerNorm            16 x 404 x 768       1,536      True      
________________________________________________________________
Dropout              16 x 404 x 768       0          False     
________________________________________________________________
Linear               16 x 404 x 3072      2,362,368  False     
________________________________________________________________
Linear               16 x 404 x 768       2,360,064  False     
________________________________________________________________
LayerNorm            16 x 404 x 768       1,536      True      
________________________________________________________________
Dropout              16 x 404 x 768       0          False     
________________________________________________________________
Linear               16 x 404 x 768       590,592    False     
________________________________________________________________
Linear               16 x 404 x 768       590,592    False     
________________________________________________________________
Linear               16 x 404 x 768       590,592    False     
________________________________________________________________
Dropout              16 x 12 x 404 x 404  0          False     
________________________________________________________________
Linear               16 x 404 x 768       590,592    False     
________________________________________________________________
LayerNorm            16 x 404 x 768       1,536      True      
________________________________________________________________
Dropout              16 x 404 x 768       0          False     
________________________________________________________________
Linear               16 x 404 x 3072      2,362,368  False     
________________________________________________________________
Linear               16 x 404 x 768       2,360,064  False     
________________________________________________________________
LayerNorm            16 x 404 x 768       1,536      True      
________________________________________________________________
Dropout              16 x 404 x 768       0          False     
________________________________________________________________
Linear               16 x 404 x 768       590,592    False     
________________________________________________________________
Linear               16 x 404 x 768       590,592    False     
________________________________________________________________
Linear               16 x 404 x 768       590,592    False     
________________________________________________________________
Dropout              16 x 12 x 404 x 404  0          False     
________________________________________________________________
Linear               16 x 404 x 768       590,592    False     
________________________________________________________________
LayerNorm            16 x 404 x 768       1,536      True      
________________________________________________________________
Dropout              16 x 404 x 768       0          False     
________________________________________________________________
Linear               16 x 404 x 3072      2,362,368  False     
________________________________________________________________
Linear               16 x 404 x 768       2,360,064  False     
________________________________________________________________
LayerNorm            16 x 404 x 768       1,536      True      
________________________________________________________________
Dropout              16 x 404 x 768       0          False     
________________________________________________________________
Linear               16 x 404 x 768       590,592    False     
________________________________________________________________
Linear               16 x 404 x 768       590,592    False     
________________________________________________________________
Linear               16 x 404 x 768       590,592    False     
________________________________________________________________
Dropout              16 x 12 x 404 x 404  0          False     
________________________________________________________________
Linear               16 x 404 x 768       590,592    False     
________________________________________________________________
LayerNorm            16 x 404 x 768       1,536      True      
________________________________________________________________
Dropout              16 x 404 x 768       0          False     
________________________________________________________________
Linear               16 x 404 x 3072      2,362,368  False     
________________________________________________________________
Linear               16 x 404 x 768       2,360,064  False     
________________________________________________________________
LayerNorm            16 x 404 x 768       1,536      True      
________________________________________________________________
Dropout              16 x 404 x 768       0          False     
________________________________________________________________
Linear               16 x 768             590,592    True      
________________________________________________________________
Dropout              16 x 768             0          False     
________________________________________________________________
Linear               16 x 6               4,614      True      
________________________________________________________________

Total params: 82,123,014
Total trainable params: 615,174
Total non-trainable params: 81,507,840

Optimizer used: functools.partial(<function Adam at 0x7fae2c31d3b0>)
Loss function: FlattenedLoss of BCEWithLogitsLoss()

Model frozen up to parameter group #2

Callbacks:
  - HF_BaseModelCallback
  - ModelToHalf
  - TrainEvalCallback
  - Recorder
  - ProgressCallback
  - MixedPrecision
preds = model(b[0])
preds.logits.shape, preds
(torch.Size([16, 6]),
 SequenceClassifierOutput(loss=None, logits=tensor([[ 0.1393,  0.0470,  0.1508, -0.2681, -0.2635, -0.2397],
         [ 0.1356,  0.0281,  0.1627, -0.2686, -0.2615, -0.2584],
         [ 0.1340,  0.0284,  0.1719, -0.2799, -0.2501, -0.2654],
         [ 0.1499,  0.0298,  0.1627, -0.2807, -0.2503, -0.2547],
         [ 0.1290,  0.0306,  0.1569, -0.2741, -0.2657, -0.2566],
         [ 0.1582,  0.0368,  0.1544, -0.2580, -0.2420, -0.2499],
         [ 0.1360,  0.0354,  0.1647, -0.2658, -0.2625, -0.2503],
         [ 0.1429,  0.0454,  0.1631, -0.2840, -0.2513, -0.2640],
         [ 0.1230,  0.0259,  0.1764, -0.2625, -0.2455, -0.2526],
         [ 0.1497,  0.0374,  0.1630, -0.2782, -0.2658, -0.2670],
         [ 0.1487,  0.0260,  0.1551, -0.2781, -0.2604, -0.2498],
         [ 0.1462,  0.0450,  0.1670, -0.2637, -0.2520, -0.2404],
         [ 0.1424,  0.0357,  0.1684, -0.2831, -0.2470, -0.2415],
         [ 0.1429,  0.0311,  0.1625, -0.2809, -0.2639, -0.2488],
         [ 0.1368,  0.0349,  0.1565, -0.2686, -0.2583, -0.2413],
         [ 0.1367,  0.0409,  0.1527, -0.2828, -0.2594, -0.2599]],
        device='cuda:1', grad_fn=<AddmmBackward>), hidden_states=None, attentions=None))
learn.lr_find(suggestions=True)
/home/wgilliam/anaconda3/envs/blurr/lib/python3.7/site-packages/fastai/learner.py:53: UserWarning: Could not load the optimizer state.
  if with_opt: warn("Could not load the optimizer state.")
SuggestedLRs(lr_min=0.010000000149011612, lr_steep=0.0014454397605732083)
learn.fit_one_cycle(1, lr_max=1e-2)
epoch train_loss valid_loss accuracy_multi time
0 0.043435 0.037120 0.992841 01:10
learn.unfreeze()
learn.lr_find(suggestions=True, start_lr=1e-12)
learn.fit_one_cycle(2, lr_max=slice(1e-10, 4e-9))
epoch train_loss valid_loss accuracy_multi time
0 0.033136 0.037120 0.992841 01:55
1 0.030737 0.037120 0.992841 01:56
learn.show_results(learner=learn, max_n=2)
text None target
0 First, your statement that "no one" has figured out how to do "free health care" is false. ALL developed nations have, with the lone exception of the US--- though Obamacare has taken us closer, much closer. \nYou have challenges in finding workers---- but you did find them, didn't you? Perhaps something is wrong with your wage structure: pay more. \nYour competitors also face the same challenges: the impact won't be on your business, alone. Higher wages mean more money for workers to spend-- at your business.\nIn a nation where the largest employer is a warehousing chain (Walmart) and the other huge similar businesses employ massive numbers of workers, I'm sorry to tell you that a living wage is a necessity. \nYou say you're not wealthy--- but that's a comparative term. How much does your paycheck work out to, per hour? \nOn a personal note, you should take a long look in the mirror and ask yourself how much the well-being of your neighbors means to you--- or if they're just so many robots. []
1 Certainly there has been rage in this community toward the protesters or “terrorists” near Burns. But let’s look at LTD:\n\nTAKES (without asking, without gratitude) hundreds of $$$ per year from my employer on my behalf (I ride LTD once every two or three years). Plans to increase its TAKE in the next few years. TAKES tens of thousands of $$$ from my employer for my fellow employees, who similarly have no use for the local transit system.\n\nTAKES money from countless other businesses whose employees have little or no use for the system.\n\nTAKES space along 6th/7th, etc., for the new EmX line. Disrupts traffic and commerce especially during construction, but to some degree perpetually. TAKES what was once green space along Franklin Blvd., Pioneer Pkwy. TAKES road space for the buses that barge out of stops demanding right of way or barge into your lane because they can't negotiate tight turns.\n\nSo who is TAKING public (and private) property and disrupting our lives? Who are the terrorists? []
learn.loss_func.thresh = 0.02
comment = """
Those damned affluent white people should only eat their own food, like cod cakes and boiled potatoes. 
No enchiladas for them!
"""
learn.blurr_predict(comment)
((#1) ['insult'],
 tensor([False, False, False,  True, False, False]),
 tensor([2.8189e-05, 3.8996e-03, 2.9137e-04, 3.1085e-02, 1.4722e-03, 9.4731e-04]))

Cleanup