This module contains custom models, loss functions, custom splitters, etc... for question answering tasks
 
What we're running with at the time this documentation was generated:
torch: 1.9.0+cu102
fastai: 2.5.2
transformers: 4.10.0

Question Answer

Given a document (context) and a question, the objective of these models is to predict the start and end token of the correct answer as it exists in the context.

Again, we'll use a subset of pre-processed SQUAD v2 for our purposes below.

# squad_df = pd.read_csv('./data/task-question-answering/squad_cleaned.csv'); len(squad_df)

# sample
squad_df = pd.read_csv('./squad_sample.csv'); len(squad_df)
1000
squad_df.head(2)
id title context question answers ds_type answer_text is_impossible
0 56be85543aeaaa14008c9063 Beyoncé Beyoncé Giselle Knowles-Carter (/biːˈjɒnseɪ/ bee-YON-say) (born September 4, 1981) is an American singer, songwriter, record producer and actress. Born and raised in Houston, Texas, she performed in various singing and dancing competitions as a child, and rose to fame in the late 1990s as lead singer of R&B girl-group Destiny's Child. Managed by her father, Mathew Knowles, the group became one of the world's best-selling girl groups of all time. Their hiatus saw the release of Beyoncé's debut album, Dangerously in Love (2003), which established her as a solo artist worldwide, earned five G... When did Beyonce start becoming popular? {'text': ['in the late 1990s'], 'answer_start': [269]} train in the late 1990s False
1 56be85543aeaaa14008c9065 Beyoncé Beyoncé Giselle Knowles-Carter (/biːˈjɒnseɪ/ bee-YON-say) (born September 4, 1981) is an American singer, songwriter, record producer and actress. Born and raised in Houston, Texas, she performed in various singing and dancing competitions as a child, and rose to fame in the late 1990s as lead singer of R&B girl-group Destiny's Child. Managed by her father, Mathew Knowles, the group became one of the world's best-selling girl groups of all time. Their hiatus saw the release of Beyoncé's debut album, Dangerously in Love (2003), which established her as a solo artist worldwide, earned five G... What areas did Beyonce compete in when she was growing up? {'text': ['singing and dancing'], 'answer_start': [207]} train singing and dancing False
pretrained_model_name = 'bert-large-uncased-whole-word-masking-finetuned-squad'
hf_model_cls = AutoModelForQuestionAnswering

hf_arch, hf_config, hf_tokenizer, hf_model = BLURR.get_hf_objects(pretrained_model_name, model_cls=hf_model_cls)

# # here's a pre-trained roberta model for squad you can try too
# pretrained_model_name = "ahotrod/roberta_large_squad2"
# hf_arch, hf_config, hf_tokenizer, hf_model = BLURR.get_hf_objects(pretrained_model_name, 
#                                                                   model_cls=AutoModelForQuestionAnswering)

# # here's a pre-trained xlm model for squad you can try too
# pretrained_model_name = 'xlm-mlm-ende-1024'
# hf_arch, hf_config, hf_tokenizer, hf_model = BLURR.get_hf_objects(pretrained_model_name,
#                                                                   model_cls=AutoModelForQuestionAnswering)
squad_df = squad_df.apply(partial(pre_process_squad, hf_arch=hf_arch, hf_tokenizer=hf_tokenizer), axis=1)
max_seq_len= 128
squad_df = squad_df[(squad_df.tokenized_input_len < max_seq_len) & (squad_df.is_impossible == False)]
vocab = list(range(max_seq_len))
# vocab = dict(enumerate(range(max_seq_len)));
trunc_strat = 'only_second' if (hf_tokenizer.padding_side == 'right') else 'only_first'

before_batch_tfm = HF_QABeforeBatchTransform(hf_arch, hf_config, hf_tokenizer, hf_model,
                                             max_length=max_seq_len, 
                                             truncation=trunc_strat, 
                                             tok_kwargs={ 'return_special_tokens_mask': True })

blocks = (
    HF_TextBlock(before_batch_tfm=before_batch_tfm, input_return_type=HF_QuestionAnswerInput), 
    CategoryBlock(vocab=vocab),
    CategoryBlock(vocab=vocab)
)

def get_x(x):
    return (x.question, x.context) if (hf_tokenizer.padding_side == 'right') else (x.context, x.question)

dblock = DataBlock(blocks=blocks, 
                   get_x=get_x,
                   get_y=[ColReader('tok_answer_start'), ColReader('tok_answer_end')],
                   splitter=RandomSplitter(),
                   n_inp=1)
dls = dblock.dataloaders(squad_df, bs=4)
len(dls.vocab), dls.vocab[0], dls.vocab[1]
(2,
 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127],
 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127])
dls.show_batch(dataloaders=dls, max_n=2)
text start/end answer
0 how many miles was the village frederic born in located to the west of warsaw? fryderyk chopin was born in zelazowa wola, 46 kilometres ( 29 miles ) west of warsaw, in what was then the duchy of warsaw, a polish state established by napoleon. the parish baptismal record gives his birthday as 22 february 1810, and cites his given names in the latin form fridericus franciscus ( in polish, he was fryderyk franciszek ). however, the composer and his family used the birthdate 1 march, [ n 2 ] which is now generally accepted as the correct date. (35, 36) 29
1 where did beyonce perform in 2011? in 2011, documents obtained by wikileaks revealed that beyonce was one of many entertainers who performed for the family of libyan ruler muammar gaddafi. rolling stone reported that the music industry was urging them to return the money they earned for the concerts ; a spokesperson for beyonce later confirmed to the huffington post that she donated the money to the clinton bush haiti fund. later that year she became the first solo female artist to headline the main pyramid stage at the 2011 glastonbury festival in over twenty years, and was named the highest - paid performer in the world per minute. (102, 107) glastonbury festival

Training

Here we create a question/answer specific subclass of HF_BaseModelCallback in order to get all the start and end prediction. We also add here a new loss function that can handle multiple targets

class HF_QstAndAnsModelCallback[source]

HF_QstAndAnsModelCallback(after_create=None, before_fit=None, before_epoch=None, before_train=None, before_batch=None, after_pred=None, after_loss=None, before_backward=None, before_step=None, after_cancel_step=None, after_step=None, after_cancel_batch=None, after_batch=None, after_cancel_train=None, after_train=None, before_validate=None, after_cancel_validate=None, after_validate=None, after_cancel_epoch=None, after_epoch=None, after_cancel_fit=None, after_fit=None) :: HF_BaseModelCallback

The prediction is a combination start/end logits

Parameters:

  • after_create : <class 'NoneType'>, optional

  • before_fit : <class 'NoneType'>, optional

  • before_epoch : <class 'NoneType'>, optional

  • before_train : <class 'NoneType'>, optional

  • before_batch : <class 'NoneType'>, optional

  • after_pred : <class 'NoneType'>, optional

  • after_loss : <class 'NoneType'>, optional

  • before_backward : <class 'NoneType'>, optional

  • before_step : <class 'NoneType'>, optional

  • after_cancel_step : <class 'NoneType'>, optional

  • after_step : <class 'NoneType'>, optional

  • after_cancel_batch : <class 'NoneType'>, optional

  • after_batch : <class 'NoneType'>, optional

  • after_cancel_train : <class 'NoneType'>, optional

  • after_train : <class 'NoneType'>, optional

  • before_validate : <class 'NoneType'>, optional

  • after_cancel_validate : <class 'NoneType'>, optional

  • after_validate : <class 'NoneType'>, optional

  • after_cancel_epoch : <class 'NoneType'>, optional

  • after_epoch : <class 'NoneType'>, optional

  • after_cancel_fit : <class 'NoneType'>, optional

  • after_fit : <class 'NoneType'>, optional

And here we provide a custom loss function our question answer task, expanding on some techniques learned from here and here.

In fact, this new loss function can be used in many other multi-modal architectures, with any mix of loss functions. For example, this can be ammended to include the is_impossible task, as well as the start/end token tasks in the SQUAD v2 dataset.

class MultiTargetLoss[source]

MultiTargetLoss(loss_classes:List[Callable]=[<class 'fastai.losses.CrossEntropyLossFlat'>, <class 'fastai.losses.CrossEntropyLossFlat'>], loss_classes_kwargs:List[dict]=[{}, {}], weights:Union[List[float], List[int]]=[1, 1], reduction:str='mean') :: Module

Provides the ability to apply different loss functions to multi-modal targets/predictions

Parameters:

  • loss_classes : typing.List[typing.Callable], optional

    The loss function for each target

  • loss_classes_kwargs : typing.List[dict], optional

    Any kwargs you want to pass to the loss functions above

  • weights : typing.Union[typing.List[float], typing.List[int]], optional

    The weights you want to apply to each loss (default: [1,1])

  • reduction : <class 'str'>, optional

    The `reduction` parameter of the lass function (default: 'mean')

model = HF_BaseModelWrapper(hf_model)

learn = Learner(dls, 
                model,
                opt_func=partial(Adam, decouple_wd=True),
                cbs=[HF_QstAndAnsModelCallback],
                splitter=hf_splitter)

learn.loss_func=MultiTargetLoss()
learn.create_opt()                # -> will create your layer groups based on your "splitter" function
learn.freeze()

Notice above how I had to define the loss function after creating the Learner object. I'm not sure why, but the MultiTargetLoss above prohibits the learner from being exported if I do.

learn.summary()
print(len(learn.opt.param_groups))
3
x, y_start, y_end = dls.one_batch()
preds = learn.model(x)
len(preds),preds[0].shape
(2, torch.Size([4, 127]))
learn.lr_find(suggest_funcs=[minimum, steep, valley, slide])
/home/wgilliam/miniconda3/envs/blurr/lib/python3.9/site-packages/fastai/callback/schedule.py:270: UserWarning: color is redundantly defined by the 'color' keyword argument and the fmt string "ro" (-> color='r'). The keyword argument will take precedence.
  ax.plot(val, idx, 'ro', label=nm, c=color)
SuggestedLRs(minimum=0.003981071710586548, steep=0.0010000000474974513, valley=0.0008317637839354575, slide=0.0020892962347716093)
learn.fit_one_cycle(3, lr_max=1e-3)
epoch train_loss valid_loss time
0 4.139011 1.324671 00:04
1 2.428752 0.630240 00:04
2 1.736481 0.554712 00:04

Showing results

Below we'll add in additional functionality to more intuitively show the results of our model.

learn.show_results(learner=learn, skip_special_tokens=True, max_n=2, trunc_at=500)
text start/end answer pred start/end pred answer
0 where did beyonce exclusively release her single, formation? on february 6, 2016, one day before her performance at the super bowl, beyonce released a new single exclusively on music streaming service tidal called " formation ". (38, 39) tidal (38, 39) tidal
1 what word does " bey hive " derive from? the bey hive is the name given to beyonce's fan base. fans were previously titled " the beyontourage ", ( a portmanteau of beyonce and entourage ). the name bey hive derives from the word beehive, purposely misspelled to resemble her first name, and was penned by fans after petitions on the online social networking service twitter and online news reports during competitions. (58, 61) beehive (58, 61) beehive

... and lets see how Learner.blurr_predict works with question/answering tasks

inf_df = pd.DataFrame.from_dict([{
    'question': 'What did George Lucas make?',
    'context': 'George Lucas created Star Wars in 1977. He directed and produced it.'   
}], 
    orient='columns')

learn.blurr_predict(inf_df.iloc[0])
[(('11', '13'),
  (#2) [tensor(11),tensor(13)],
  (#2) [tensor([3.0268e-07, 6.9921e-08, 5.9632e-09, 1.2420e-08, 8.5584e-09, 7.5558e-09,
        9.2788e-10, 3.0270e-07, 3.8582e-04, 2.7305e-05, 8.3689e-04, 9.9857e-01,
        1.5739e-04, 4.2566e-07, 7.8813e-06, 5.0365e-07, 4.5226e-06, 4.6080e-06,
        3.3246e-08, 2.2053e-06, 8.2759e-07, 1.2332e-07, 2.5745e-07]),tensor([1.6131e-03, 8.3521e-05, 5.9296e-06, 2.2950e-06, 9.9383e-06, 6.0271e-06,
        2.4133e-05, 1.6132e-03, 3.1182e-05, 1.2563e-04, 9.7756e-05, 1.4331e-05,
        5.6870e-02, 5.2701e-01, 2.6772e-02, 3.5492e-01, 2.6779e-04, 8.0594e-05,
        2.0279e-04, 1.8910e-04, 6.8663e-03, 2.1703e-02, 1.4861e-03])])]
inf_df = pd.DataFrame.from_dict([
    {
        'question': 'What did George Lucas make?',
        'context': 'George Lucas created Star Wars in 1977. He directed and produced it.'   
    }, {
        'question': 'What year did Star Wars come out?',
        'context': 'George Lucas created Star Wars in 1977. He directed and produced it.' 
    }, {
        'question': 'What did George Lucas do?',
        'context': 'George Lucas created Star Wars in 1977. He directed and produced it.' 
    }], 
    orient='columns')

learn.blurr_predict(inf_df)
[(('11', '13'),
  (#2) [tensor(11),tensor(13)],
  (#2) [tensor([3.0268e-07, 6.9921e-08, 5.9632e-09, 1.2420e-08, 8.5584e-09, 7.5558e-09,
        9.2787e-10, 3.0270e-07, 3.8582e-04, 2.7305e-05, 8.3689e-04, 9.9857e-01,
        1.5739e-04, 4.2566e-07, 7.8812e-06, 5.0365e-07, 4.5226e-06, 4.6080e-06,
        3.3246e-08, 2.2053e-06, 8.2759e-07, 1.2332e-07, 2.5745e-07, 5.4841e-10,
        7.3210e-10]),tensor([1.6131e-03, 8.3521e-05, 5.9296e-06, 2.2950e-06, 9.9383e-06, 6.0271e-06,
        2.4133e-05, 1.6132e-03, 3.1182e-05, 1.2563e-04, 9.7755e-05, 1.4331e-05,
        5.6870e-02, 5.2701e-01, 2.6772e-02, 3.5492e-01, 2.6779e-04, 8.0594e-05,
        2.0279e-04, 1.8910e-04, 6.8663e-03, 2.1703e-02, 1.4861e-03, 1.3298e-06,
        8.1794e-07])]),
 (('16', '17'),
  (#2) [tensor(16),tensor(17)],
  (#2) [tensor([1.8138e-06, 3.6914e-06, 7.9606e-08, 5.7100e-08, 4.9475e-08, 3.7448e-08,
        5.3773e-08, 6.4744e-08, 2.9010e-08, 1.8139e-06, 1.5196e-06, 1.7933e-06,
        2.9139e-06, 5.4099e-06, 2.6638e-06, 6.3237e-05, 9.9991e-01, 2.0777e-06,
        3.3066e-07, 3.0617e-07, 6.3106e-08, 3.2725e-07, 6.9513e-07, 9.8958e-07,
        1.8122e-06]),tensor([3.1355e-03, 6.7197e-04, 5.7665e-04, 2.1287e-04, 8.4543e-05, 1.6099e-04,
        1.0164e-04, 1.8573e-04, 4.5048e-04, 3.1355e-03, 6.3488e-04, 1.0317e-03,
        7.2071e-04, 3.0524e-04, 1.0807e-03, 1.1295e-03, 7.2798e-03, 9.5853e-01,
        1.0582e-02, 8.9976e-04, 8.4763e-04, 7.5183e-04, 2.1189e-03, 2.2401e-03,
        3.1360e-03])]),
 (('17', '21'),
  (#2) [tensor(17),tensor(21)],
  (#2) [tensor([8.9343e-06, 3.5278e-07, 8.2588e-08, 1.7797e-07, 7.2601e-08, 9.2827e-08,
        2.2630e-08, 8.9359e-06, 4.9041e-03, 5.3172e-04, 1.2702e-01, 4.9925e-04,
        3.2418e-05, 2.7297e-06, 4.7746e-05, 1.4660e-05, 1.0720e-01, 7.3418e-01,
        2.3097e-05, 2.5464e-02, 3.5826e-05, 1.3317e-05, 8.6433e-06, 1.0467e-08,
        1.6150e-08]),tensor([3.4201e-03, 2.1435e-05, 1.3353e-05, 5.0382e-06, 2.2327e-05, 1.2700e-05,
        3.4319e-05, 3.4203e-03, 3.7694e-05, 2.4082e-04, 3.4410e-04, 3.1209e-04,
        1.6885e-02, 2.4353e-02, 7.8388e-03, 7.9421e-02, 4.1039e-05, 8.2386e-05,
        1.3766e-03, 3.6992e-03, 3.0023e-01, 5.5487e-01, 3.3109e-03, 3.7449e-06,
        1.6923e-06])])]
inp_ids = hf_tokenizer.encode('What did George Lucas make?',
                              'George Lucas created Star Wars in 1977. He directed and produced it.')

hf_tokenizer.convert_ids_to_tokens(inp_ids, skip_special_tokens=False)[11:13]
['star', 'wars']

Note that there is a bug currently in fastai v2 (or with how I'm assembling everything) that currently prevents us from seeing the decoded predictions and probabilities for the "end" token.

inf_df = pd.DataFrame.from_dict([{
    'question': 'When was Star Wars made?',
    'context': 'George Lucas created Star Wars in 1977. He directed and produced it.'
}], 
    orient='columns')

test_dl = dls.test_dl(inf_df)
inp = test_dl.one_batch()[0]['input_ids']
probs, _, preds = learn.get_preds(dl=test_dl, with_input=False, with_decoded=True)
hf_tokenizer.convert_ids_to_tokens(inp.tolist()[0], 
                                   skip_special_tokens=False)[torch.argmax(probs[0]):torch.argmax(probs[1])]
['1977']

We can unfreeze and continue training like normal

learn.unfreeze()
learn.fit_one_cycle(3, lr_max=slice(1e-7, 1e-4))
epoch train_loss valid_loss time
0 0.998746 0.454062 00:07
1 0.788146 0.426064 00:07
2 0.641020 0.407723 00:08
learn.recorder.plot_loss()
learn.show_results(learner=learn, max_n=2, trunc_at=100)
text start/end answer pred start/end pred answer
0 where did beyonce exclusively release her single, formation? on february 6, 2016, one day before her (38, 39) tidal (38, 39) tidal
1 her first appearance performing since giving birth was where? on january 7, 2012, beyonce gave birth (52, 61) revel atlantic city's ovation hall (52, 61) revel atlantic city's ovation hall
learn.blurr_predict(inf_df.iloc[0])
[(('14', '15'),
  (#2) [tensor(14),tensor(15)],
  (#2) [tensor([2.0139e-07, 6.0186e-08, 1.3398e-08, 8.2077e-09, 7.6575e-09, 2.5011e-08,
        3.6197e-09, 2.0140e-07, 1.6501e-06, 7.0809e-07, 6.9617e-06, 4.8223e-06,
        1.0504e-06, 9.7600e-04, 9.9901e-01, 9.0924e-07, 8.0747e-08, 4.3993e-08,
        6.1624e-09, 3.3514e-08, 7.0549e-08, 6.1190e-08, 1.9218e-07]),tensor([4.5972e-04, 1.5797e-05, 6.4806e-06, 2.9688e-06, 4.6566e-06, 2.9982e-06,
        1.1183e-05, 4.5972e-04, 2.5425e-05, 3.4846e-05, 5.2598e-05, 9.6193e-06,
        3.9660e-05, 1.1938e-04, 1.1622e-03, 9.9376e-01, 3.1741e-03, 2.8255e-05,
        2.0109e-05, 1.9298e-05, 7.3358e-05, 6.7907e-05, 4.4704e-04])])]
preds, pred_classes, probs = zip(*learn.blurr_predict(inf_df.iloc[0]))
preds
(('14', '15'),)
inp_ids = hf_tokenizer.encode('When was Star Wars made?',
                              'George Lucas created Star Wars in 1977. He directed and produced it.')

hf_tokenizer.convert_ids_to_tokens(inp_ids, skip_special_tokens=False)[int(preds[0][0]):int(preds[0][1])]
['1977']

Inference

Note that I had to replace the loss function because of the above-mentioned issue to exporting the model with the MultiTargetLoss loss function. After getting our inference learner, we put it back and we're good to go!

export_name = 'q_and_a_learn_export'
learn.loss_func = CrossEntropyLossFlat()
learn.export(fname=f'{export_name}.pkl')
inf_learn = load_learner(fname=f'{export_name}.pkl')
inf_learn.loss_func = MultiTargetLoss()

inf_df = pd.DataFrame.from_dict([
    {
        'question': 'What did George Lucas make?',
        'context': 'George Lucas created Star Wars in 1977. He directed and produced it.'   
    }, {
        'question': 'What year did Star Wars come out?',
        'context': 'George Lucas created Star Wars in 1977. He directed and produced it.' 
    }, {
        'question': 'What did George Lucas do?',
        'context': 'George Lucas created Star Wars in 1977. He directed and produced it.' 
    }], 
    orient='columns')

inf_learn.blurr_predict(inf_df)
[(('11', '13'),
  (#2) [tensor(11),tensor(13)],
  (#2) [tensor([1.8343e-07, 5.0318e-08, 4.8847e-09, 8.4643e-09, 5.8875e-09, 7.1094e-09,
        7.2085e-10, 1.8345e-07, 1.3470e-04, 1.2801e-05, 5.9846e-04, 9.9913e-01,
        1.1949e-04, 1.4157e-07, 3.5095e-06, 1.6165e-07, 1.3399e-06, 1.5361e-06,
        1.6692e-08, 7.6845e-07, 2.2477e-07, 4.0865e-08, 1.0515e-07, 4.5671e-10,
        6.0244e-10]),tensor([6.7522e-04, 2.6496e-05, 1.9889e-06, 7.5134e-07, 2.1462e-06, 1.3290e-06,
        5.0080e-06, 6.7526e-04, 7.9513e-06, 2.0008e-05, 2.3254e-05, 4.8967e-06,
        2.2808e-02, 7.4460e-01, 1.1859e-02, 2.1369e-01, 7.4602e-05, 2.3168e-05,
        4.3674e-05, 4.5102e-05, 1.5111e-03, 3.3360e-03, 5.5911e-04, 4.4061e-07,
        3.0934e-07])]),
 (('16', '17'),
  (#2) [tensor(16),tensor(17)],
  (#2) [tensor([1.1055e-06, 1.8704e-06, 4.5308e-08, 3.7884e-08, 3.0064e-08, 2.3709e-08,
        3.2830e-08, 3.9723e-08, 1.7936e-08, 1.1055e-06, 5.2583e-07, 6.4640e-07,
        9.6506e-07, 1.8904e-06, 1.1272e-06, 2.7888e-05, 9.9996e-01, 5.5272e-07,
        1.1390e-07, 1.1871e-07, 3.3400e-08, 1.2625e-07, 2.4034e-07, 2.9475e-07,
        1.1031e-06]),tensor([1.4690e-03, 3.0677e-04, 2.0203e-04, 8.3740e-05, 3.4471e-05, 5.6418e-05,
        3.7654e-05, 5.9635e-05, 1.2717e-04, 1.4690e-03, 1.7384e-04, 2.6960e-04,
        2.0993e-04, 1.0054e-04, 2.9895e-04, 3.2731e-04, 3.5134e-03, 9.8572e-01,
        2.8103e-03, 2.1733e-04, 1.6969e-04, 1.6633e-04, 3.8966e-04, 3.1913e-04,
        1.4693e-03])]),
 (('17', '21'),
  (#2) [tensor(17),tensor(21)],
  (#2) [tensor([9.8801e-06, 4.2964e-07, 1.1790e-07, 2.2178e-07, 9.1010e-08, 1.3551e-07,
        2.9758e-08, 9.8821e-06, 3.2094e-03, 4.5714e-04, 1.5957e-01, 3.8280e-04,
        2.7563e-05, 1.3641e-06, 4.6444e-05, 1.0686e-05, 1.1351e-01, 6.9439e-01,
        2.3044e-05, 2.8310e-02, 2.4390e-05, 8.8607e-06, 9.0558e-06, 1.2868e-08,
        2.0218e-08]),tensor([1.9702e-03, 8.6120e-06, 5.9526e-06, 2.1202e-06, 6.5693e-06, 4.6015e-06,
        1.1350e-05, 1.9704e-03, 1.2754e-05, 6.1397e-05, 1.4907e-04, 1.4174e-04,
        8.3424e-03, 1.6695e-02, 2.8036e-03, 3.3022e-02, 1.5188e-05, 3.1796e-05,
        5.4052e-04, 1.2613e-03, 3.0156e-01, 6.2954e-01, 1.8494e-03, 1.7348e-06,
        8.5971e-07])])]
inp_ids = hf_tokenizer.encode('What did George Lucas make?',
                              'George Lucas created Star Wars in 1977. He directed and produced it.')

hf_tokenizer.convert_ids_to_tokens(inp_ids, skip_special_tokens=False)[11:13]
['star', 'wars']

High-level API

BLearnerForQuestionAnswering

class BlearnerForQuestionAnswering[source]

BlearnerForQuestionAnswering(dls:DataLoaders, hf_model:PreTrainedModel, base_model_cb:HF_BaseModelCallback=HF_BaseModelCallback, loss_func=None, opt_func=Adam, lr=0.001, splitter=trainable_params, cbs=None, metrics=None, path=None, model_dir='models', wd=None, wd_bn_bias=False, train_bn=True, moms=(0.95, 0.85, 0.95)) :: Blearner

Group together a model, some dls and a loss_func to handle training

Parameters:

  • dls : <class 'fastai.data.core.DataLoaders'>

  • hf_model : <class 'transformers.modeling_utils.PreTrainedModel'>

  • kwargs : <class 'inspect._empty'>

BLearnerForQuestionAnswering requires a question, context (within which to find the answer to the question), and the start/end indices of where the answer lies in the tokenized context. Because those indices vary by tokenizer, we can pass a preprocess_func that will take our raw data, perform any preprocessing we want, and return it in a way that will work for extractive QA.

def preprocess_df(df, hf_arch, hf_config, hf_tokenizer, hf_model, max_seq_len, 
                  context_attr, question_attr, answer_text_attr, tok_ans_start_attr, tok_ans_end_attr):
    
    df = df.apply(partial(pre_process_squad, hf_arch=hf_arch, hf_tokenizer=hf_tokenizer, ctx_attr=context_attr, 
                          qst_attr=question_attr, ans_attr=answer_text_attr), axis=1)
    
    df = df[(df.tokenized_input_len < max_seq_len) & (df.is_impossible == False)]
    
    return df

Let's re-grab the raw data and use the high-level API to train

squad_df = pd.read_csv('./squad_sample.csv')

pretrained_model_name = 'bert-large-uncased-whole-word-masking-finetuned-squad'

learn = BlearnerForQuestionAnswering.from_dataframe(squad_df, pretrained_model_name,
                                                    preprocess_func=preprocess_df, max_seq_len=128,
                                                    dblock_splitter=RandomSplitter(), 
                                                    dl_kwargs={ 'bs': 4 }).to_fp16()
learn.dls.show_batch(dataloaders=learn.dls, max_n=2, trunc_at=500)
text start/end answer
0 where did beyonce perform in 2011? in 2011, documents obtained by wikileaks revealed that beyonce was one of many entertainers who performed for the family of libyan ruler muammar gaddafi. rolling stone reported that the music industry was urging them to return the money they earned for the concerts ; a spokesperson for beyonce later confirmed to the huffington post that she donated the money to the clinton bush haiti fund. later that year she became the first solo female artist to headline the (102, 107) glastonbury festival
1 what language does she mainly sing? beyonce's music is generally r & b, but she also incorporates pop, soul and funk into her songs. 4 demonstrated beyonce's exploration of 90s - style r & b, as well as further use of soul and hip hop than compared to previous releases. while she almost exclusively releases english songs, beyonce recorded several spanish songs for irreemplazable ( re - recordings of songs from b'day for a spanish - language audience ), and the re - release of b'day. to record th (67, 68) english
learn.fit_one_cycle(3, lr_max=1e-3)
epoch train_loss valid_loss time
0 4.252031 1.568532 00:05
1 2.521215 0.815515 00:05
2 1.775412 0.717965 00:05
learn.show_results(learner=learn, skip_special_tokens=True, max_n=2, trunc_at=500)
text start/end answer pred start/end pred answer
0 how much bail money did they spend? following the death of freddie gray, beyonce and jay - z, among other notable figures, met with his family. after the imprisonment of protesters of gray's death, beyonce and jay - z donated thousands of dollars to bail them out. (50, 53) thousands of dollars (50, 53) thousands of dollars
1 how was the suit settled? the release of a video - game starpower : beyonce was cancelled after beyonce pulled out of a $ 100 million with gatefive who alleged the cancellation meant the sacking of 70 staff and millions of pounds lost in development. it was settled out of court by her lawyers in june 2013 who said that they had cancelled because gatefive had lost its financial backers. beyonce also has had deals with american express, nintendo ds and l'oreal since the age of 18. (56, 59) out of court (56, 59) out of court
learn.loss_func = CrossEntropyLossFlat()
learn.export(fname=f'{export_name}.pkl')
inf_learn = load_learner(fname=f'{export_name}.pkl')
inf_learn = load_learner(fname=f'{export_name}.pkl')
inf_learn.loss_func = MultiTargetLoss()

inf_df = pd.DataFrame.from_dict([
    {
        'question': 'What did George Lucas make?',
        'context': 'George Lucas created Star Wars in 1977. He directed and produced it.'   
    }, {
        'question': 'What year did Star Wars come out?',
        'context': 'George Lucas created Star Wars in 1977. He directed and produced it.' 
    }, {
        'question': 'What did George Lucas do?',
        'context': 'George Lucas created Star Wars in 1977. He directed and produced it.' 
    }], 
    orient='columns')

inf_learn.blurr_predict(inf_df)
[(('11', '13'),
  (#2) [tensor(11),tensor(13)],
  (#2) [tensor([2.7284e-07, 8.3190e-08, 3.7982e-09, 7.9587e-09, 6.0532e-09, 5.2288e-09,
        6.7031e-10, 2.7283e-07, 1.3389e-04, 1.2072e-05, 4.0296e-04, 9.9912e-01,
        2.9206e-04, 4.0871e-07, 1.7246e-05, 5.3299e-07, 7.4445e-06, 9.2675e-06,
        2.7095e-08, 4.0778e-06, 1.4063e-06, 2.7503e-07, 2.7421e-07, 3.4118e-10,
        4.1309e-10]),tensor([2.2255e-03, 1.9398e-04, 1.4811e-05, 6.0629e-06, 2.3273e-05, 1.2139e-05,
        4.9242e-05, 2.2257e-03, 6.3393e-05, 2.5946e-04, 2.7473e-04, 4.4387e-05,
        8.3237e-02, 4.8414e-01, 1.9951e-02, 3.9145e-01, 3.2640e-04, 1.6923e-04,
        4.2092e-04, 4.4922e-04, 1.0046e-02, 2.2117e-03, 2.1988e-03, 2.9638e-06,
        1.8428e-06])]),
 (('16', '17'),
  (#2) [tensor(16),tensor(17)],
  (#2) [tensor([5.6653e-07, 2.0562e-06, 2.7093e-08, 1.3462e-08, 1.3675e-08, 9.8854e-09,
        1.6482e-08, 1.7449e-08, 8.9821e-09, 5.6657e-07, 6.3263e-07, 7.9726e-07,
        9.9635e-07, 2.6835e-06, 7.8746e-07, 2.4374e-05, 9.9996e-01, 1.0222e-06,
        1.4942e-07, 1.1580e-07, 2.3458e-08, 1.3362e-07, 3.7979e-07, 5.6670e-07,
        5.6669e-07]),tensor([2.9526e-03, 7.3836e-04, 6.0738e-04, 2.0252e-04, 7.6138e-05, 1.5352e-04,
        8.7644e-05, 1.7705e-04, 5.1405e-04, 2.9525e-03, 7.0142e-04, 1.1698e-03,
        7.6684e-04, 3.1254e-04, 1.2962e-03, 1.3482e-03, 1.0334e-02, 9.5583e-01,
        9.3413e-03, 8.3610e-04, 8.9849e-04, 7.6981e-04, 2.0273e-03, 2.9528e-03,
        2.9529e-03])]),
 (('17', '21'),
  (#2) [tensor(17),tensor(21)],
  (#2) [tensor([7.6245e-06, 2.5309e-07, 4.6802e-08, 1.1821e-07, 5.4243e-08, 5.6511e-08,
        1.4816e-08, 7.6252e-06, 3.1773e-03, 3.0238e-04, 1.2024e-01, 4.0060e-04,
        3.5399e-05, 2.5930e-06, 7.0493e-05, 7.6382e-06, 9.2087e-02, 7.4348e-01,
        1.6772e-05, 4.0094e-02, 5.2537e-05, 1.6554e-05, 7.6619e-06, 6.5125e-09,
        9.3592e-09]),tensor([3.2386e-03, 3.2943e-05, 2.0672e-05, 7.7881e-06, 3.0920e-05, 1.7127e-05,
        5.0501e-05, 3.2389e-03, 6.1905e-05, 3.5101e-04, 6.3880e-04, 3.3710e-04,
        1.3215e-02, 1.3914e-02, 6.1565e-03, 3.2390e-03, 7.0665e-05, 1.4825e-04,
        1.5560e-03, 6.0942e-03, 3.0879e-01, 6.3557e-01, 3.2132e-03, 5.1384e-06,
        2.4964e-06])])]

Summary

This module includes all the low, mid, and high-level API bits for extractive Q&A tasks training and inference.