Show all of the high-level `BlurrFor` classes in action here with the raw data sourced from the Hugging Face Datasets library.
 
Here's what we're running with ...

Using pytorch 1.7.1
Using fastai 2.4
Using transformers 4.8.1
Using GPU #1: GeForce GTX 1080 Ti

While most of the code and examples in the documentation show how to work with Blurr given a pandas Dataframe, these set of examples will show you how to use the high-level Blurr API with any Hugging Face dataset. The high-level API provides one liners to build your DataBlock, DataLoaders, and Learner (with sensible defaults) from a DataFrame, CSV file, or a list of dictionaries as we do so here.

Sequence Classification

Multiclassification (one input)

raw_datasets = load_dataset('glue', 'cola') 
print(f'{raw_datasets}\n')
print(f'{raw_datasets["train"][0]}\n')
print(f'{raw_datasets["train"].features}\n')
Reusing dataset glue (/home/wgilliam/.cache/huggingface/datasets/glue/cola/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad)
DatasetDict({
    train: Dataset({
        features: ['sentence', 'label', 'idx'],
        num_rows: 8551
    })
    validation: Dataset({
        features: ['sentence', 'label', 'idx'],
        num_rows: 1043
    })
    test: Dataset({
        features: ['sentence', 'label', 'idx'],
        num_rows: 1063
    })
})

{'idx': 0, 'label': 1, 'sentence': "Our friends won't buy this analysis, let alone the next one we propose."}

{'sentence': Value(dtype='string', id=None), 'label': ClassLabel(num_classes=2, names=['unacceptable', 'acceptable'], names_file=None, id=None), 'idx': Value(dtype='int32', id=None)}

Capture the indexes for both train and validation sets, use the datasets concatenate_datasets to put them into a single dataset, and finally use the IndexSplitter method to define our train/validation splits as such:

train_ds = raw_datasets['train']#.select(range(10000))
valid_ds = raw_datasets['validation']#.select(range(2000))
n_train, n_valid = train_ds.num_rows, valid_ds.num_rows
train_idxs, valid_idxs = L(range(n_train)), L(range(n_train, n_train + n_valid))
raw_ds = concatenate_datasets([train_ds, valid_ds])
dl_kwargs = {'bs': 4, 'val_bs': 8}
learn_kwargs = { 'metrics': [accuracy] }

learn = BlearnerForSequenceClassification.from_dictionaries(raw_ds, 'distilroberta-base', 
                                                            text_attr='sentence', label_attr='label',
                                                            dblock_splitter=IndexSplitter(valid_idxs),
                                                            dl_kwargs=dl_kwargs, learner_kwargs=learn_kwargs)
learn = learn.to_fp16()
learn.dls.show_batch(dataloaders=learn.dls, trunc_at=500, max_n=5)
text category
0 Everybody who has ever, worked in any office which contained any typewriter which had ever been used to type any letters which had to be signed by any administrator who ever worked in any department like mine will know what I mean. 1
1 The victims of the earthquake their property was destroyed in the disaster were given temporary housing by the government. 1
2 It is this problem that the sooner you solve the more easily you'll satisfy the folks up at corporate headquarters. 1
3 Reports the height of the lettering on the covers of which the government prescribes should be abolished. 1
learn.fit_one_cycle(1, lr_max=2e-3)
epoch train_loss valid_loss accuracy time
0 0.486840 0.504518 0.766059 01:07
learn.show_results(learner=learn, max_n=5)
text category target
0 Scientists at the South Hanoi Institute of Technology have succeeded in raising one dog with five legs, another with a cow's liver, and a third with no head. 1 1
1 The newspaper has reported that they are about to appoint someone, but I can't remember who the newspaper has reported that they are about to appoint. 1 1
2 He attributed to a short circuit which was caused by an overloaded transducer the fire which destroyed most of my factory. 1 0
3 Sandy was trying to work out which students would be able to solve a certain problem, but she wouldn't tell us which one. 0 1
4 The newspaper has reported that they are about to appoint someone, but I can't remember who they are about to appoint. 1 1

Learner.blurr_predict works here too

learn.blurr_predict('Blurr aint no joke yo')
[(('0',), (#1) [tensor(0)], (#1) [tensor([0.5803, 0.4197])])]

Multiclassification (two inputs)

raw_datasets = load_dataset('glue', 'mrpc') 
print(f'{raw_datasets}\n')
print(f'{raw_datasets["train"][0]}\n')
print(f'{raw_datasets["train"].features}\n')
Reusing dataset glue (/home/wgilliam/.cache/huggingface/datasets/glue/mrpc/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad)
DatasetDict({
    train: Dataset({
        features: ['sentence1', 'sentence2', 'label', 'idx'],
        num_rows: 3668
    })
    validation: Dataset({
        features: ['sentence1', 'sentence2', 'label', 'idx'],
        num_rows: 408
    })
    test: Dataset({
        features: ['sentence1', 'sentence2', 'label', 'idx'],
        num_rows: 1725
    })
})

{'idx': 0, 'label': 1, 'sentence1': 'Amrozi accused his brother , whom he called " the witness " , of deliberately distorting his evidence .', 'sentence2': 'Referring to him as only " the witness " , Amrozi accused his brother of deliberately distorting his evidence .'}

{'sentence1': Value(dtype='string', id=None), 'sentence2': Value(dtype='string', id=None), 'label': ClassLabel(num_classes=2, names=['not_equivalent', 'equivalent'], names_file=None, id=None), 'idx': Value(dtype='int32', id=None)}

train_ds = raw_datasets['train']#.select(range(10000))
valid_ds = raw_datasets['validation']#.select(range(2000))
n_train, n_valid = train_ds.num_rows, valid_ds.num_rows
train_idxs, valid_idxs = L(range(n_train)), L(range(n_train, n_train + n_valid))
raw_ds = concatenate_datasets([train_ds, valid_ds])
dl_kwargs = {'bs': 4, 'val_bs': 8}
learn_kwargs = { 'metrics': [F1Score(), accuracy] }

learn = BlearnerForSequenceClassification.from_dictionaries(raw_ds, 'distilroberta-base', 
                                                            text_attr=['sentence1', 'sentence2'], 
                                                            label_attr='label',
                                                            dblock_splitter=IndexSplitter(valid_idxs),
                                                            dl_kwargs=dl_kwargs, learner_kwargs=learn_kwargs)
learn = learn.to_fp16()
learn.dls.show_batch(dataloaders=learn.dls, trunc_at=500, max_n=5)
text category
0 Amrozi accused his brother, whom he called " the witness ", of deliberately distorting his evidence. Referring to him as only " the witness ", Amrozi accused his brother of deliberately distorting his evidence. 1
1 However, Hayes, the CDC official, said there are many complicated interactions in play. But Hayes, of the CDC said, " Many complicated interactions come into play that are often difficult to predict. " 0
2 Smackdown, first organized by WWE, registered more than 400,000 voters for the 2000 presidential election. Since the election, the group has registered more than 400,000 voters. 0
3 The league said it is not taking a position on whether Governor Gray Davis should be recalled in the Oct. 7 voting, and will not endorse a replacement candidate. The group said it is not taking a position on whether Davis should be recalled and will not endorse a replacement candidate. 1
learn.fit_one_cycle(1, lr_max=2e-3)
epoch train_loss valid_loss f1_score accuracy time
0 0.509784 0.463203 0.843800 0.762255 00:29
learn.show_results(learner=learn, max_n=5)
text category target
0 He said the foodservice pie business doesn 't fit the company's long-term growth strategy. " The foodservice pie business does not fit our long-term growth strategy. 1 1
1 Mr. Young said he was disappointed that the government didn 't see the severe acute respiratory syndrome crisis as worthy of federal disaster-relief money. Young said he was disappointed the government didn 't see the SARS crisis as worthy of federal disaster relief money. 1 1
2 Saddam loyalists have been blamed for sabotaging the nation's infrastructure, as well as frequent attacks on U.S. soldiers. Hussein loyalists have been blamed for sabotaging the nation's infrastructure and attacking US soldiers. 1 1
3 If the MTA's appeal to a higher court is successful, the $ 2 bus and subway base fare won 't be rolled back. If the MTA's appeal is successful, the $ 2 bus and subway base fare won 't change. 1 1
4 Nelson, 27, is being retried on civil-rights charges stemming from the disturbance which led to Rosenbaum's death. Nelson, 27, is being retried on civil rights charges stemming from the disturbance that led to Rosenbaum's death. 1 1

Multilabel classification

raw_datasets = load_dataset('civil_comments')
print(f'{raw_datasets}\n')
print(f'{raw_datasets["train"][0]}\n')
print(f'{raw_datasets["train"].features}\n')
Using custom data configuration default
Reusing dataset civil_comments (/home/wgilliam/.cache/huggingface/datasets/civil_comments/default/0.9.0/e7a3aacd2ab7d135fa958e7209d10b1fa03807d44c486e3c34897aa08ea8ffab)
DatasetDict({
    train: Dataset({
        features: ['text', 'toxicity', 'severe_toxicity', 'obscene', 'threat', 'insult', 'identity_attack', 'sexual_explicit'],
        num_rows: 1804874
    })
    validation: Dataset({
        features: ['text', 'toxicity', 'severe_toxicity', 'obscene', 'threat', 'insult', 'identity_attack', 'sexual_explicit'],
        num_rows: 97320
    })
    test: Dataset({
        features: ['text', 'toxicity', 'severe_toxicity', 'obscene', 'threat', 'insult', 'identity_attack', 'sexual_explicit'],
        num_rows: 97320
    })
})

{'identity_attack': 0.0, 'insult': 0.0, 'obscene': 0.0, 'severe_toxicity': 0.0, 'sexual_explicit': 0.0, 'text': "This is so cool. It's like, 'would you want your mother to read this??' Really great idea, well done!", 'threat': 0.0, 'toxicity': 0.0}

{'text': Value(dtype='string', id=None), 'toxicity': Value(dtype='float32', id=None), 'severe_toxicity': Value(dtype='float32', id=None), 'obscene': Value(dtype='float32', id=None), 'threat': Value(dtype='float32', id=None), 'insult': Value(dtype='float32', id=None), 'identity_attack': Value(dtype='float32', id=None), 'sexual_explicit': Value(dtype='float32', id=None)}

lbl_cols =  ['identity_attack', 'insult', 'obscene', 'toxicity', 'severe_toxicity', 'sexual_explicit', 'threat']
train_ds = raw_datasets['train'].select(range(10000))
valid_ds = raw_datasets['validation'].select(range(2000))
n_train, n_valid = len(train_ds), len(valid_ds)
train_idxs, valid_idxs = L(range(n_train)), L(range(n_train, n_train + n_valid))
raw_ds = concatenate_datasets([train_ds, valid_ds])

The labels need to be OHE as ints (the raw data has them as floats). We could also do this kind of preprocessing by passing in a preprocess_func to our BlearnerForSequenceClassification factory method, especially useful if such preprocessing depends on one or more of the Hugging Face objects (e.g., config, tokenizer, model, architecture)

def make_ohe(item):
    for k in item.keys():
        if (k in lbl_cols):
            item[k] = int(np.round(item[k]))
    return item

raw_ds = raw_ds.map(make_ohe)
Loading cached processed dataset at /home/wgilliam/.cache/huggingface/datasets/civil_comments/default/0.9.0/e7a3aacd2ab7d135fa958e7209d10b1fa03807d44c486e3c34897aa08ea8ffab/cache-615cf128814440d8.arrow
dl_kwargs = {'bs': 4, 'val_bs': 8}
learn_kwargs = { 'metrics': [F1ScoreMulti(), accuracy_multi] }

# using a List[dict] such as a Hugging Face dataset
learn = BlearnerForSequenceClassification.from_dictionaries(raw_ds, 'distilroberta-base', 
                                                            text_attr='text', label_attr=lbl_cols,
                                                            dblock_splitter=IndexSplitter(valid_idxs),
                                                            dl_kwargs=dl_kwargs, learner_kwargs=learn_kwargs)
learn = learn.to_fp16()
learn.dls.show_batch(dataloaders=learn.dls, trunc_at=500, max_n=5)
text None
0 Predatory patrol towing isn't a big subject, and there is no advocacy group that is paying any attention to it, but the City of Portland has completely backed off of enforcing state law where the towing predators are operating on private property, and this is Commissioner Novick's failure. He's in charge of towing.\n\nThe City has allowed Retriever Towing to operate in open violation of ADA for years at their NW Quimby lot, and there is absolutely no provision in city ordinance that takes into ac
1 As usual WW plumbing the depths for deeper meaning... that is unless it involves an issue on which they disagree then it is ridicule 24/7. Clever creating the Bundyland series complete with cartoon banner. Set the tone for the level of journalism to expect... journalism? ... fatastisticism. \n\nI did notice you soft pedaling the ridicule of David Fry identifying him as troubled. My guess is that has more to do with sympathy for his pot smoking withdrawl rants than respect for his politics. R
2 we bear SOME responsibility for our own safety? I think we bear 90% of it. people ae just dumb about street smarts - they've been conditioned by our backwards laws. cars rule, simply cause they are bigger and will hurt you should there be a misunderstanding of who has the right away. using a crosswalk helps a heck of a lot.\n as a walker, I always yielded in a parking lot to a car backing out of a space. common sense, yeah? the drivers have 2 directions that other cars can come from, pe
3 [... continued]\nMueller is at his best with "Thank goodness more rational, professional voices prevailed" and widened W 6th and 7th Aves to four lanes each. These two streets were already serious barriers, separating the Westside from our sister neighborhood Whiteaker and walking to the river. They became essentially a Trump-like "wall" after they were widened even further. I moved into the Westside decades ago because of its "walkability," and I'm still agile enough to dodge the cars and get f
learn.fit_one_cycle(1, lr_max=2e-3)
epoch train_loss valid_loss f1_score accuracy_multi time
0 0.026036 0.039305 0.204006 0.986999 01:27
/home/wgilliam/miniconda3/envs/blurr/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1495: UndefinedMetricWarning: F-score is ill-defined and being set to 0.0 in labels with no true nor predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(
learn.show_results(learner=learn, trun_at=500, max_n=5)
/home/wgilliam/miniconda3/envs/blurr/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1495: UndefinedMetricWarning: F-score is ill-defined and being set to 0.0 in labels with no true nor predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(
text None target
0 Everyone tries to hack everyone else. I have no doubt Russia would try to hack even canada. However, the US has been doing the same, if we recall Snowden.\n\nEven Merkel's phone conversations were being tapped by the CIA. \n\nThe real purpose of this issue is political. Trump is upset because people are trying to imply that he didn't deserve his victory, that the Russians helped him. It's an ego thing. Good CEOs sometimes have giant egos. I have no problem with that as long as they produce results, I gladly buy shares in their company.\n\nOtoh, Russia did invade Crimea recently, and their missile brought down a commercial airliner and killed lots of innocent people. The world has a right to be annoyed at the Russians.\n\nIf you want to find evidence of Russians hacking, you will find them. But if you want to find China or some guy in a basement somewhere, I have no doubt you can find the same as well. Whether they succeeded or not, that's hard to prove, but there's lots of blackhats []
1 Mr. Alali, I am sympathetic to your position and feelings. As a Canadian I hold no ill will towards you or your family relocating to Canada. You should be aware that you and your family have been used as political pawns following the glib and ill conceived election promise made by our Prime Minister to bring 25 thousand of your compatriots to Canada by the end of 2015. It sounds as if Mr McCallum and assistants scoured UN refugee lists in an effort to press gang hapless individuals and coerce them into settling and being shipped to Canada. The expedition of your arrival was made with no regard to the logistics of accommodating vulnerable and traumatized families in a respectful and decent manner following your staged and publicized arrivals. As for your future in Canada I fear you will be lucky to find some subsistence level employment. The chances are that your children, if you allow them to assimilate into the Canadian culture, will thrive and have a rewarding life in here. Good Luck []
2 I abjure violence of any kind and this includes violence propagated through money by those who have the means to do so. I believe in free speech except when used to incite violence or hatred. I believe in the right of individual freedoms providing harm to others is not caused.. This is my bias.\n\nOthers believe an economic and social order that favours a few is natural to human nature and that those who can gain advantage, without consideration of harm to others, should be allowed to do so. That is their bias.\n\nEither view will result in bad behaviour by either side depending on who is ascendant. Trump was not against suppressing free speech or condoning violence as one could see during his campaign rallies.\n\nThe world will never be perfect or fair but beginning with Roosevelt and ending with Reagan there was a time when average people could see a slow but steady increase in living standards and opportunity. The rise of neo-liberal policies has removed this expectation for most. []
3 Chittester quotes Biden--an excommunicated Catholic.\nHilliary did not represent Catholics-- which is why Trump won the Catholic vote.\nThese posters represent PC--progressive communists and propaganda suckers. They seem to soak up every piece of propaganda issued by progressive operatives.\nI' m sure they all " believe" in AGW and population control as the remedy through abortion, they " believe" Adam and Eve are mythological but evolution-- long ago debunked-- is fact, they\n " belive" women' s rights and feminism are not really code words for abortion and they \n" believe" poor Hilliary missed her chance to make partial birth abortions taxpayer funded, religious rights non- existent, create more world chaos, incite more racial riots and divisive conflict, shut down oil companies and erect windmills, gut the country of any remaining blue collar jobs, ensure no conservative speaks outbin any college classroom, make sure all young college men have no rights when accused by disturbed girls. []
4 sometimes it's hard to judge which is company is better. your location has a lot to do with it especially if you live in an apartment building. sometimes you have access to both companies but often it can be just one of them and you have no choice. sometimes you don't have access to the best or fastest service the companies have to offer because of an aging infrastructure. ymmv is the best way to describe the situation.\n\nas far as comparing to other companies and services on the mainland we could do a lot worse. there are places where there are monopolies. worse when you're stuck with a company like comcast and you don't even have unlimited internet which is taken for granted here never mind some "top" speeds of 25 megabits. bundled service for one television and a monthly fee of over $200 is just nuts. this example is from cities near the capital of washington state.\n\nso yes we could have better service but looking from other locations you could also say we're lucky for what we have. []

Token Classification

raw_datasets = load_dataset('germeval_14') 
print(f'{raw_datasets}\n')
print(f'{raw_datasets["train"][0]}\n')
print(f'{raw_datasets["train"].features}\n')
Reusing dataset germ_eval14 (/home/wgilliam/.cache/huggingface/datasets/germ_eval14/germeval_14/2.0.0/0f174b84866aa3b8ebae65c271610520be4422405d7e8467bd24cfd493d325f0)
DatasetDict({
    train: Dataset({
        features: ['id', 'source', 'tokens', 'ner_tags', 'nested_ner_tags'],
        num_rows: 24000
    })
    validation: Dataset({
        features: ['id', 'source', 'tokens', 'ner_tags', 'nested_ner_tags'],
        num_rows: 2200
    })
    test: Dataset({
        features: ['id', 'source', 'tokens', 'ner_tags', 'nested_ner_tags'],
        num_rows: 5100
    })
})

{'id': '0', 'ner_tags': [19, 0, 0, 0, 7, 0, 0, 0, 0, 19, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 'nested_ner_tags': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 'source': 'n-tv.de vom 26.02.2005 [2005-02-26] ', 'tokens': ['Schartau', 'sagte', 'dem', '"', 'Tagesspiegel', '"', 'vom', 'Freitag', ',', 'Fischer', 'sei', '"', 'in', 'einer', 'Weise', 'aufgetreten', ',', 'die', 'alles', 'andere', 'als', 'überzeugend', 'war', '"', '.']}

{'id': Value(dtype='string', id=None), 'source': Value(dtype='string', id=None), 'tokens': Sequence(feature=Value(dtype='string', id=None), length=-1, id=None), 'ner_tags': Sequence(feature=ClassLabel(num_classes=25, names=['O', 'B-LOC', 'I-LOC', 'B-LOCderiv', 'I-LOCderiv', 'B-LOCpart', 'I-LOCpart', 'B-ORG', 'I-ORG', 'B-ORGderiv', 'I-ORGderiv', 'B-ORGpart', 'I-ORGpart', 'B-OTH', 'I-OTH', 'B-OTHderiv', 'I-OTHderiv', 'B-OTHpart', 'I-OTHpart', 'B-PER', 'I-PER', 'B-PERderiv', 'I-PERderiv', 'B-PERpart', 'I-PERpart'], names_file=None, id=None), length=-1, id=None), 'nested_ner_tags': Sequence(feature=ClassLabel(num_classes=25, names=['O', 'B-LOC', 'I-LOC', 'B-LOCderiv', 'I-LOCderiv', 'B-LOCpart', 'I-LOCpart', 'B-ORG', 'I-ORG', 'B-ORGderiv', 'I-ORGderiv', 'B-ORGpart', 'I-ORGpart', 'B-OTH', 'I-OTH', 'B-OTHderiv', 'I-OTHderiv', 'B-OTHpart', 'I-OTHpart', 'B-PER', 'I-PER', 'B-PERderiv', 'I-PERderiv', 'B-PERpart', 'I-PERpart'], names_file=None, id=None), length=-1, id=None)}

train_ds = raw_datasets['train']#.select(range(1000))
valid_ds = raw_datasets['validation']#.select(range(500))
n_train, n_valid = train_ds.num_rows, valid_ds.num_rows
train_idxs, valid_idxs = L(range(n_train)), L(range(n_train, n_train + n_valid))
raw_ds = concatenate_datasets([train_ds, valid_ds])

We can grab the "labels" a token can be associated with as we do here or we can let the BlearnerForTokenClassification factory methods figure it out for us.

labels = train_ds.features['ner_tags'].feature.names
len(labels)
25

As we need pass the tag (not the index) for each example's tokens in a list, we use the handy datasets.map function to create a new attribute, "token_labels", with that data. This could also be done by passing in a preprocess_func to a BlearnerForTokenClassification factory method; especially useful if we need to use one or more of the Hugging Face objects (e.g., tokenzier, model, config, or architecture name)

def get_item_labels(example):
    example['token_labels'] = [ labels[tag_idx] for tag_idx in example['ner_tags'] ]
    return example
                         
raw_ds = raw_ds.map(get_item_labels)
learn = BlearnerForTokenClassification.from_dictionaries(raw_ds, 'bert-base-multilingual-cased', 
                                                         tokens_attr='tokens', token_labels_attr='token_labels', 
                                                         labels=labels, dblock_splitter=IndexSplitter(valid_idxs), 
                                                         dl_kwargs={'bs':2})

learn.unfreeze()
fit_cbs = [HF_TokenClassMetricsCallback()]
learn.dls.show_batch(dataloaders=learn.dls, max_n=2)
token / target label
0 [('Andere', 'O'), ('Albumtitel', 'O'), ('sind', 'O'), ('an', 'O'), ('bekannte', 'O'), ('Begriffe', 'O'), ('angelehnt', 'O'), (':', 'O'), ('Fettes', 'B-OTH'), ('Brot', 'I-OTH'), ('für', 'I-OTH'), ('die', 'I-OTH'), ('Welt', 'I-OTH'), ('(', 'O'), ('Brot', 'O'), ('für', 'O'), ('die', 'O'), ('Welt', 'O'), ('),', 'O'), ('Auf', 'O'), ('einem', 'B-OTH'), ('Auge', 'I-OTH'), ('blöd', 'I-OTH'), ('(', 'I-OTH'), ('„', 'O'), ('Auf', 'O'), ('einem', 'O'), ('Auge', 'O'), ('blind', 'O'), ('),', 'O'), ('Am', 'O'), ('Wasser', 'O'), ('gebaut', 'O'), ('(', 'B-OTH'), ('„', 'I-OTH'), ('Nah', 'I-OTH'), ('am', 'O'), ('Wasser', 'O'), ('gebaut', 'O'), (')', 'O'), ('und', 'O'), ('Strom', 'O'), ('und', 'O'), ('Drang', 'O'), ('(', 'O'), ('„', 'B-OTH'), ('Sturm', 'I-OTH'), ('und', 'I-OTH'), ('Drang', 'O'), (').', 'O')]
1 [('Asolo', 'B-LOC'), ('Gentile', 'B-PER'), ('Bellini', 'I-PER'), (':', 'O'), ('Das', 'B-OTH'), ('Kreuzeswunder', 'I-OTH'), ('auf', 'I-OTH'), ('der', 'I-OTH'), ('Brücke', 'I-OTH'), ('von', 'I-OTH'), ('San', 'I-OTH'), ('Lorenzo', 'I-OTH'), ('(', 'O'), ('1500', 'O'), (')', 'O'), ('Nach', 'O'), ('ihrer', 'O'), ('Rückkehr', 'O'), ('nach', 'O'), ('Venedig', 'B-LOC'), ('wurde', 'O'), ('sie', 'O'), ('mit', 'O'), ('der', 'O'), ('Stadt', 'O'), ('und', 'O'), ('Burg', 'O'), ('Asolo', 'B-LOC'), ('in', 'O'), ('Oberitalien', 'B-LOC'), ('entschädigt,', 'O'), ('wo', 'O'), ('sie', 'O'), ('für', 'O'), ('die', 'O'), ('nächsten', 'O'), ('zwanzig', 'O'), ('Jahre', 'O'), ('ihren', 'O'), ('Wohnsitz', 'O'), ('nahm.', 'O')]
learn.fit_one_cycle(1, lr_max= 3e-5, moms=(0.8,0.7,0.8), cbs=fit_cbs)
epoch train_loss valid_loss accuracy precision recall f1 time
0 0.101921 0.065371 0.980001 0.859387 0.829603 0.844232 16:14
/home/wgilliam/miniconda3/envs/blurr/lib/python3.9/site-packages/seqeval/metrics/v1.py:57: UndefinedMetricWarning: Recall and F-score are ill-defined and being set to 0.0 in labels with no true samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, msg_start, len(result))
learn.show_results(learner=learn, max_n=2, trunc_at=10)
token / target label / predicted label
0 [('Darüber', 'O', 'O'), ('hinaus', 'O', 'O'), ('produziert', 'O', 'O'), ('der', 'O', 'O'), ('hr', 'B-ORG', 'O'), ('allein', 'O', 'O'), ('oder', 'O', 'O'), ('federführend', 'O', 'O'), ('mit', 'O', 'O'), ('anderen', 'O', 'O')]
1 [('Trotz', 'O', 'O'), ('meiner', 'O', 'O'), ('beiden', 'O', 'O'), ('Tore', 'O', 'O'), ('habe', 'O', 'O'), ('ich', 'O', 'O'), ('die', 'O', 'O'), ('größte', 'O', 'O'), ('Freude', 'O', 'O'), ('in', 'O', 'O')]
print(learn.token_classification_report)
              precision    recall  f1-score   support

         LOC       0.90      0.90      0.90       766
    LOCderiv       0.92      0.86      0.89       252
     LOCpart       0.69      0.69      0.69        52
         ORG       0.81      0.73      0.77       554
    ORGderiv       0.00      0.00      0.00         0
     ORGpart       0.82      0.73      0.77       103
         OTH       0.70      0.69      0.69       275
    OTHderiv       0.75      0.57      0.65        21
     OTHpart       0.06      0.33      0.10         3
         PER       0.95      0.92      0.93       736
    PERderiv       0.00      0.00      0.00         0
     PERpart       0.11      0.25      0.15         8

   micro avg       0.86      0.83      0.84      2770
   macro avg       0.56      0.56      0.55      2770
weighted avg       0.87      0.83      0.85      2770

txt ="I live in California, but I'd love to travel to Scotland and visit the Macallan distillery."
txt2 = "Jane Doe loves working for ohmeow.com."
res = learn.blurr_predict_tokens([txt.split(), txt2.split()])
for r in res: print(f'{[(tok, lbl) for tok,lbl in zip(r[0],r[1]) ]}\n')
[('I', 'O'), ('live', 'O'), ('in', 'O'), ('California,', 'B-LOC'), ('but', 'O'), ("I'd", 'O'), ('love', 'O'), ('to', 'O'), ('travel', 'O'), ('to', 'O'), ('Scotland', 'B-LOC'), ('and', 'O'), ('visit', 'O'), ('the', 'O'), ('Macallan', 'B-ORG'), ('distillery.', 'O')]

[('Jane', 'B-PER'), ('Doe', 'I-PER'), ('loves', 'O'), ('working', 'O'), ('for', 'O'), ('ohmeow.com.', 'B-OTH')]

Question Answering

raw_datasets = load_dataset('squad_v2')
print(f'{raw_datasets}\n')
print(f'{raw_datasets["train"][0]}\n')
print(f'{raw_datasets["train"].features}\n')
Reusing dataset squad_v2 (/home/wgilliam/.cache/huggingface/datasets/squad_v2/squad_v2/2.0.0/ba48bc29b974701e9ba8d80ac94f3e3df924aba41b764dcf9851debea7c672e4)
DatasetDict({
    train: Dataset({
        features: ['id', 'title', 'context', 'question', 'answers'],
        num_rows: 130319
    })
    validation: Dataset({
        features: ['id', 'title', 'context', 'question', 'answers'],
        num_rows: 11873
    })
})

{'answers': {'answer_start': [269], 'text': ['in the late 1990s']}, 'context': 'Beyoncé Giselle Knowles-Carter (/biːˈjɒnseɪ/ bee-YON-say) (born September 4, 1981) is an American singer, songwriter, record producer and actress. Born and raised in Houston, Texas, she performed in various singing and dancing competitions as a child, and rose to fame in the late 1990s as lead singer of R&B girl-group Destiny\'s Child. Managed by her father, Mathew Knowles, the group became one of the world\'s best-selling girl groups of all time. Their hiatus saw the release of Beyoncé\'s debut album, Dangerously in Love (2003), which established her as a solo artist worldwide, earned five Grammy Awards and featured the Billboard Hot 100 number-one singles "Crazy in Love" and "Baby Boy".', 'id': '56be85543aeaaa14008c9063', 'question': 'When did Beyonce start becoming popular?', 'title': 'Beyoncé'}

{'id': Value(dtype='string', id=None), 'title': Value(dtype='string', id=None), 'context': Value(dtype='string', id=None), 'question': Value(dtype='string', id=None), 'answers': Sequence(feature={'text': Value(dtype='string', id=None), 'answer_start': Value(dtype='int32', id=None)}, length=-1, id=None)}

train_ds = raw_datasets['train'].select(range(1000))

We use the preprocess_func here as the preprocessing is dependent upon the Hugging Face tokenizer which will vary dependending on the pretrained model we use for the task.

def preprocess_ds(ds, hf_arch, hf_config, hf_tokenizer, hf_model, max_seq_len, 
                  context_attr, question_attr, answer_text_attr, tok_ans_start, tok_ans_end):
    
    def _preprocess(item):
        tok_kwargs = {}
        if(hf_tokenizer.padding_side == 'right'):
            tok_input = hf_tokenizer.convert_ids_to_tokens(hf_tokenizer.encode(item[question_attr], item[context_attr]), 
                                                           **tok_kwargs)
        else:
            tok_input = hf_tokenizer.convert_ids_to_tokens(hf_tokenizer.encode(item[context_attr], item[question_attr]), 
                                                           **tok_kwargs)

        tok_ans = hf_tokenizer.tokenize(str(item['answers']['text'][0]), **tok_kwargs)
        
        start_idx, end_idx = 0,0
        
        if(len(tok_input) < max_seq_len):
            for idx, tok in enumerate(tok_input):
                try:
                    if (tok == tok_ans[0] and tok_input[idx:idx + len(tok_ans)] == tok_ans): 
                        start_idx, end_idx = idx, idx + len(tok_ans)
                        break
                except: pass

        item['tokenized_input'] = tok_input
        item['tokenized_input_len'] = len(tok_input)
        item['tok_answer_start'] = start_idx
        item['tok_answer_end'] = end_idx

        return item
    
    ds = ds.map(_preprocess)
    return ds
pretrained_model_name = 'bert-large-uncased-whole-word-masking-finetuned-squad'

learn = BlearnerForQuestionAnswering.from_dataframe(train_ds, pretrained_model_name,
                                                    preprocess_func=preprocess_ds, max_seq_len=256,
                                                    dblock_splitter=RandomSplitter(), dl_kwargs={ 'bs': 4 })
learn = learn.to_fp16()
learn.dls.show_batch(dataloaders=learn.dls, max_n=2, trunc_at=500)
text start/end answer
0 with what british band did beyonce perform on their album? at the 57th annual grammy awards in february 2015, beyonce was nominated for six awards, ultimately winning three : best r & b performance and best r & b song for " drunk in love ", and best surround sound album for beyonce. she was nominated for album of the year but the award was won by beck for his morning phase album. in august, the cover of the september issue of vogue magazine was unveiled online, beyonce as the cover star, becomin (164, 166) coldplay
1 beyonce had singers in the background known by the name as? in 2006, beyonce introduced her all - female tour band suga mama ( also the name of a song in b'day ) which includes bassists, drummers, guitarists, horn players, keyboardists and percussionists. her background singers, the mamas, consist of montina cooper - donnell, crystal collins and tiffany monique riddick. they made their debut appearance at the 2006 bet awards and re - appeared in the music videos for " irreplaceable " and " green (63, 66) the mamas
learn.fit_one_cycle(1, lr_max=1e-3)
epoch train_loss valid_loss time
0 2.018549 2.108820 00:53
learn.show_results(learner=learn, skip_special_tokens=True, max_n=2, trunc_at=500)
text start/end answer pred start/end pred answer
0 when did beyonce receive the legend award? beyonce has received numerous awards. as a solo artist she has sold over 15 million albums in the us, and over 118 million records worldwide ( a further 60 million additionally with destiny's child ), making her one of the best - selling music artists of all time. the recording industry association of america ( riaa ) listed beyonce as the top certified artist of the 2000s, with a total of 64 certifications. her songs " crazy in love ", " single ladies (0, 0) (244, 248) 2008 world music awards
1 which year did beyonce and her father part business ways? beyonce announced a hiatus from her music career in january 2010, heeding her mother's advice, " to live life, to be inspired by things again ". during the break she and her father parted ways as business partners. beyonce's musical break lasted nine months and saw her visit multiple european cities, the great wall of china, the egyptian pyramids, australia, english music festivals and various museums and ballet performances. (23, 24) 2010 (23, 24) 2010

Language modeling

raw_datasets = load_dataset('wikitext', 'wikitext-2-raw-v1')
print(f'{raw_datasets}\n')
print(f'{raw_datasets["train"][0]}\n')
print(f'{raw_datasets["train"].features}\n')
Reusing dataset wikitext (/home/wgilliam/.cache/huggingface/datasets/wikitext/wikitext-2-raw-v1/1.0.0/aa5e094000ec7afeb74c3be92c88313cd6f132d564c7effd961c10fd47c76f20)
DatasetDict({
    test: Dataset({
        features: ['text'],
        num_rows: 4358
    })
    train: Dataset({
        features: ['text'],
        num_rows: 36718
    })
    validation: Dataset({
        features: ['text'],
        num_rows: 3760
    })
})

{'text': ''}

{'text': Value(dtype='string', id=None)}

train_ds = raw_datasets['train'].select(range(1000))
valid_ds = raw_datasets['validation'].select(range(1000))
n_train, n_valid = train_ds.num_rows, valid_ds.num_rows
train_idxs, valid_idxs = L(range(n_train)), L(range(n_train, n_train + n_valid))
raw_ds = concatenate_datasets([train_ds, valid_ds])
def remove_empty_text(example):
    if (example['text'].strip() == ''): example['text'] = '  '
    return example
raw_ds = raw_ds.map(remove_empty_text)

Causal language modeling

learn = BlearnerForLM.from_dictionaries(raw_ds, 'gpt2', text_attr='text', 
                                        lm_strategy_cls=CausalLMStrategy,
                                        dblock_splitter=IndexSplitter(valid_idxs), 
                                        dl_kwargs={'bs':2}).to_fp16()
Using pad_token, but it is not set yet.
learn.dls.show_batch(dataloaders=learn.dls, max_n=2, trunc_at=250)
text target
0 At the same time, a local court in Germany ruled that the television rights to the FIA European Truck Racing Cup ( passed to Ecclestone by the FIA the previous year, along with all other FIA authorised championships ) should be returned to the serie the same time, a local court in Germany ruled that the television rights to the FIA European Truck Racing Cup ( passed to Ecclestone by the FIA the previous year, along with all other FIA authorised championships ) should be returned to the series o
1 On 23 May 1915, between two and four hours after news of the Italian declaration of war reached the main Austro @-@ Hungarian naval base at Pola, Zrínyi and the rest of the fleet departed to bombard the Italian and Montenegrin coast. Their focus was 23 May 1915, between two and four hours after news of the Italian declaration of war reached the main Austro @-@ Hungarian naval base at Pola, Zrínyi and the rest of the fleet departed to bombard the Italian and Montenegrin coast. Their focus was on
learn.fit_one_cycle(1, lr_max=3e-4, cbs=[BlearnerForLM.get_metrics_cb()])
epoch train_loss valid_loss perplexity lm_accuracy time
0 2.083310 2.177482 8.824057 0.350493 17:18
learn.show_results(learner=learn, max_n=2, trunc_at=500)
text target prediction
0 Moving forward under a creeping barrage as they moved beyond Tokinotu, the 24th Infantry Battalion was the first to contact the Japanese, carrying out an attack against Japanese positions around Dawe's Creek on 17 April. Supported by a troop of Matilda tanks from the 2 / 4th Armoured Regiment, an artillery barrage which fired over 700 shells, two infantry companies —'C'and'D'— from the 24th attacked the position while another —'A'Company — carried out a flanking manoeuvre to cut another track f forward under a creeping barrage as they moved beyond Tokinotu, the 24th Infantry Battalion was the first to contact the Japanese, carrying out an attack against Japanese positions around Dawe's Creek on 17 April. Supported by a troop of Matilda tanks from the 2 / 4th Armoured Regiment, an artillery barrage which fired over 700 shells, two infantry companies —'C'and'D'— from the 24th attacked the position while another —'A'Company — carried out a flanking manoeuvre to cut another track further to, the new shadow of the approached forward theushimaawaomi, the Japaneseth Division Division was able first to move the enemy forces and out a attack on the forces. theashi.s village. the April. The by the Japanese of Japanesehews,, the 1nd 3th Tankoured Division, the infantry unit from was at the rounds, the of divisions, thesajs'D's and the 2th Infantry the Japanese. the battalion'S'— —'out a counteranking attackre — the off Japanese through south. thea. theons.. Japanese flank column was'
1 Cooke led his small squadron past Corregidor on 15 January and turned south. Four days later in a storm one of the gunboats broke its tow line and was never seen again, lost with its twelve crew. The frigates subsequently scouted Mindanao before reaching Zamboanga on 22 January. There Cooke raised Spanish colours in an attempt to deceive the authorities into supplying food and water to his squadron but Sybille grounded on a sandbank at the entrance to the port which raised the suspicions of a g led his small squadron past Corregidor on 15 January and turned south. Four days later in a storm one of the gunboats broke its tow line and was never seen again, lost with its twelve crew. The frigates subsequently scouted Mindanao before reaching Zamboanga on 22 January. There Cooke raised Spanish colours in an attempt to deceive the authorities into supplying food and water to his squadron but Sybille grounded on a sandbank at the entrance to the port which raised the suspicions of a guardbo was the team band of theonidus, the April, captured south to The days later, the convoy, of the squadronboats was off wayline and sank sunk seen again. leaving sight the crew crew. shipid were returnedutt theeno and being theanzanga. 16 January. , was his flags and the attempt to lure the enemy. believing the to supplies to the squadron. waskesala was the the reefbar in the end to the island. was fears alarm of the Spanish.. by the Spanish. Zamboanga. whooo deinñaol. governor of the ship squa

Learner.blurr_generate works here too

learn.blurr_generate('Blurr is fun to work with because', max_length=50, do_sample=True, top_k=25)
[' Blurr is fun to work with because a few things happen along the way that give rise to many more fun aspects of the game : the time of the game is limited to only 3 days ( 3 weeks = 2 months ) ; all other characters']

Masked language modeling

learn = BlearnerForLM.from_dictionaries(raw_ds, 'bert-base-uncased', text_attr='text', 
                                        lm_strategy_cls=BertMLMStrategy,
                                        dblock_splitter=IndexSplitter(valid_idxs), 
                                        dl_kwargs={'bs':2}).to_fp16()
learn.fit_one_cycle(1, lr_max=3e-4, cbs=[BlearnerForLM.get_metrics_cb()])
epoch train_loss valid_loss perplexity lm_accuracy time
0 0.867960 0.781689 2.185160 0.672552 17:19
learn.show_results(learner=learn, max_n=2, trunc_at=500)
text target prediction
0 moving forward [MASK] a creeping [barrage] as they moved beyond to ##kin ##ot [##u] , the [consolidated] infantry [MASK] was [MASK] first to contact the japanese , carrying out an attack against japanese positions around [MASK] ##we ' [MASK] [cost] on 17 april . supported by a troop of matilda tanks from [the] 2 / 4th armoured regiment , an [MASK] barrage which [MASK] over 700 shells , two infantry companies — ' c ' and ' [MASK] ' — from the 24th attacked the [position] while another — ' a ' company [MASK] carried out a flanking man ##oe ##u ##vre to cut another track further north towards [MASK] ##ra and hat ##ai [MASK] the left forward company — ' d ' company — [MASK] [MASK] objective [MASK] trouble ; however , ' c ' company — on the right along with the [MASK] of tanks — came up against [MASK] japanese resistance and became bog ##ged down [MASK] ' a ' company also became [MASK] ##bro ##iled in heavy fighting along the hat ##ai track . in support of ' a ' [MASK] , mat ##ili ##das came forward and [raked] the jungle , hacking through the under ##growth [MASK] reveal several [MASK] pill ##box [##es] , which were destroyed by [MASK] australian [MASK] . as night fell , ' c ' company dug in before res ##uming the attack the next morning . [MASK] were brought forward , as was a bull [MASK] ##zer , and the gap was bridge ##d . amidst heavy fighting , the australians forced their way across the creek . by the time that the position [MASK] been taken in the afternoon [and] the infantry [logs] advanced to the line [MASK] exploitation 400 yd [MASK] 370 m ) beyond [MASK] creek , 37 [MASK] had been killed [MASK] the loss of [MASK] australians killed and 19 wounded [MASK] after this , the australians [MASK] their [semester] towards sin ##dou creek , which was [MASK] further 1 mi ( 1 [MASK] [MASK] @ 6 km [)] to the southeast . in response , the japanese launched a number [MASK] determined counterattack ##s over the [MASK] [adhere] the following week , although [MASK] were turned back . during this time , the [MASK] sent a number of patrols out [MASK] front of their forward elements , one of which managed [MASK] slip through the japanese [MASK] positions [MASK] side of the bu ##in road and carried out [darmstadt] reconnaissance of the hong ##ora ##i [MASK] about 1 @ , [MASK] 000 [MASK] ( 910 [MASK] [MASK] south of [MASK] main crossing . further patrols were carried out , as well as a [MASK] of ambush ##es , before the advance was [MASK] [MASK] 26 [MASK] . moving forward [under] a creeping [barrage] as they moved beyond to ##kin ##ot [##u] , the [24th] infantry [battalion] was [the] first to contact the japanese , carrying out an attack against japanese positions around [da] ##we ' [s] [creek] on 17 april . supported by a troop of matilda tanks from [the] 2 / 4th armoured regiment , an [artillery] barrage which [fired] over 700 shells , two infantry companies — ' c ' and ' [d] ' — from the 24th attacked the [position] while another — ' a ' company [—] carried out a flanking man ##oe ##u ##vre to cut another track further north towards [kinda] ##ra and hat ##ai [.] the left forward company — ' d ' company — [reached] [its] objective [without] trouble ; however , ' c ' company — on the right along with the [troop] of tanks — came up against [stiff] japanese resistance and became bog ##ged down [.] ' a ' company also became [em] ##bro ##iled in heavy fighting along the hat ##ai track . in support of ' a ' [company] , mat ##ili ##das came forward and [raked] the jungle , hacking through the under ##growth [to] reveal several [japanese] pill ##box [##es] , which were destroyed by [the] australian [armour] . as night fell , ' c ' company dug in before res ##uming the attack the next morning . [engineers] were brought forward , as was a bull [##do] ##zer , and the gap was bridge ##d . amidst heavy fighting , the australians forced their way across the creek . by the time that the position [had] been taken in the afternoon [and] the infantry [had] advanced to the line [of] exploitation 400 yd [(] 370 m ) beyond [the] creek , 37 [japanese] had been killed [for] the loss of [seven] australians killed and 19 wounded [.] after this , the australians [continued] their [advance] towards sin ##dou creek , which was [a] further 1 mi ( 1 [@] [.] @ 6 km [)] to the southeast . in response , the japanese launched a number [of] determined counterattack ##s over the [course] [of] the following week , although [these] were turned back . during this time , the [australians] sent a number of patrols out [in] front of their forward elements , one of which managed [to] slip through the japanese [defensive] positions [either] side of the bu ##in road and carried out [a] reconnaissance of the hong ##ora ##i [river] about 1 @ , [@] 000 [yd] ( 910 [m] [)] south of [the] main crossing . further patrols were carried out , as well as a [number] of ambush ##es , before the advance was [resumed] [on] 26 [april] . moving forward [with] a creeping [barrage] as they moved beyond to ##kin ##ot [##u] , the [consolidated] infantry [company] was [the] first to contact the japanese , carrying out an attack against japanese positions around [ma] ##we ' [s] [ridge] on 17 april . supported by a troop of matilda tanks from [the] 2 / 4th armoured regiment , an [artillery] barrage which [fired] over 700 shells , two infantry companies — ' c ' and ' [d] ' — from the 24th attacked the [position] while another — ' a ' company [,] carried out a flanking man ##oe ##u ##vre to cut another track further north towards [tai] ##ra and hat ##ai [.] the left forward company — ' d ' company — [reached] [the] objective [in] trouble ; however , ' c ' company — on the right along with the [company] of tanks — came up against [the] japanese resistance and became bog ##ged down [.] ' a ' company also became [em] ##bro ##iled in heavy fighting along the hat ##ai track . in support of ' a ' [company] , mat ##ili ##das came forward and [raked] the jungle , hacking through the under ##growth [to] reveal several [japanese] pill ##box [##es] , which were destroyed by [the] australian [artillery] . as night fell , ' c ' company dug in before res ##uming the attack the next morning . [reinforcements] were brought forward , as was a bull [##do] ##zer , and the gap was bridge ##d . amidst heavy fighting , the australians forced their way across the creek . by the time that the position [had] been taken in the afternoon [and] the infantry [had] advanced to the line [of] exploitation 400 yd [(] 370 m ) beyond [the] creek , 37 [men] had been killed [,] the loss of [15] australians killed and 19 wounded [.] after this , the australians [continued] their [advance] towards sin ##dou creek , which was [a] further 1 mi ( 1 [@] [.] @ 6 km [)] to the southeast . in response , the japanese launched a number [of] determined counterattack ##s over the [ridge] [over] the following week , although [they] were turned back . during this time , the [australians] sent a number of patrols out [in] front of their forward elements , one of which managed [to] slip through the japanese [forward] positions [either] side of the bu ##in road and carried out [a] reconnaissance of the hong ##ora ##i [ridge] about 1 @ , [@] 000 [yd] ( 910 [km] [)] south of [the] main crossing . further patrols were carried out , as well as a [number] of ambush ##es , before the advance was [halted] [on] 26 [april] .
1 shortly after the islamist inspired terrorist attacks in new york and washington on 11 september 2001 , australian forces were committed to the american @ - @ led [MASK] coalition [MASK] terrorism . [MASK] ad [##f] ' s [MASK] visible contribution — code ##name ##d operation slip ##per — has been a [special] forces [MASK] [MASK] operating in afghanistan from 2001 [MASK] [MASK] [MASK] again from [MASK] @ - [MASK] 2005 to [MASK] against the [MASK] . over [time] the australian commitment has grown , with [of] addition of further ground forces in the form of a reconstruction task force from [MASK] to [MASK] security [,] reconstruction and [MASK] [mentor] and train [MASK] afghan national army . australia has [MASK] contributed a frigate and two ap @ - @ 3 ##c orion surveillance aircraft and [MASK] c [MASK] - @ [MASK] hercules transport aircraft to international operations in the [MASK] gulf and indian ocean since 2001 , supporting [MASK] [MASK] operations in afghanistan and those in [MASK] under [MASK] catalyst [MASK] a [detachment] of four f / a @ [-] @ 18 [MASK] fighter @ - [MASK] bombers [MASK] based at diego garcia from late @ - [MASK] 2001 to mid @ - @ 2002 , while [MASK] boeing 70 ##7 air @ - @ to @ - @ air ref ##uel [MASK] aircraft were also [MASK] in mana ##s air base in kyrgyzstan to [MASK] support to coalition aircraft operating in afghan [MASK] [MASK] were later withdrawn . [MASK] [MASK] [##nts] task group was deployed to support the reconstruction task ##force in [MASK] [MASK] . in addition to radar [MASK] [dev] logistics [mustache] intelligence officers [MASK] and security personnel , [MASK] brought [MASK] [MASK] of [MASK] personnel in [MASK] to 950 by mid @ - @ 2007 [MASK] [##ld] further small [MASK] to 1 @ [,] @ 000 in mid @ [MASK] @ 2008 , 1 @ , @ [MASK] [MASK] early 2009 [nominal] 1 @ , [MASK] 550 in mid [MASK] - @ 2009 . shortly after the islamist inspired terrorist attacks in new york and washington on 11 september 2001 , australian forces were committed to the american @ - @ led [international] coalition [against] terrorism . [the] ad [##f] ' s [most] visible contribution — code ##name ##d operation slip ##per — has been a [special] forces [task] [group] operating in afghanistan from 2001 [to] [2002] [and] again from [mid] @ - [@] 2005 to [fight] against the [taliban] . over [time] the australian commitment has grown , with [the] addition of further ground forces in the form of a reconstruction task force from [2006] to [provide] security [,] reconstruction and [to] [mentor] and train [the] afghan national army . australia has [also] contributed a frigate and two ap @ - @ 3 ##c orion surveillance aircraft and [three] c [@] - @ [130] hercules transport aircraft to international operations in the [persian] gulf and indian ocean since 2001 , supporting [both] [the] operations in afghanistan and those in [iraq] under [operation] catalyst [.] a [detachment] of four f / a @ [-] @ 18 [hornet] fighter @ - [@] bombers [was] based at diego garcia from late @ - [@] 2001 to mid @ - @ 2002 , while [two] boeing 70 ##7 air @ - @ to @ - @ air ref ##uel [##ling] aircraft were also [based] in mana ##s air base in kyrgyzstan to [provide] support to coalition aircraft operating in afghan [airspace] [but] were later withdrawn . [a] [special] [operations] task group was deployed to support the reconstruction task ##force in [april] [2007] . in addition to radar [crews] [,] logistics [and] intelligence officers [,] and security personnel , [this] brought [the] [number] of [australian] personnel in [afghanistan] to 950 by mid @ - @ 2007 [,] [with] further small [increases] to 1 @ [,] @ 000 in mid @ [-] @ 2008 , 1 @ , @ [100] [in] early 2009 [and] 1 @ , [@] 550 in mid [@] - @ 2009 . shortly after the islamist inspired terrorist attacks in new york and washington on 11 september 2001 , australian forces were committed to the american @ - @ led [international] coalition [against] terrorism . [the] ad [##f] ' s [most] visible contribution — code ##name ##d operation slip ##per — has been a [special] forces [operation] [,] operating in afghanistan from 2001 [to] [and] [and] again from [mid] @ - [@] 2005 to [assist] against the [taliban] . over [time] the australian commitment has grown , with [the] addition of further ground forces in the form of a reconstruction task force from [2005] to [provide] security [,] reconstruction and [to] [of] and train [the] afghan national army . australia has [also] contributed a frigate and two ap @ - @ 3 ##c orion surveillance aircraft and [a] c [@] - @ [130] hercules transport aircraft to international operations in the [persian] gulf and indian ocean since 2001 , supporting [the] [security] operations in afghanistan and those in [afghanistan] under [the] catalyst [.] a [detachment] of four f / a @ [-] @ 18 [joint] fighter @ - [@] bombers [were] based at diego garcia from late @ - [@] 2001 to mid @ - @ 2002 , while [two] boeing 70 ##7 air @ - @ to @ - @ air ref ##uel [##er] aircraft were also [based] in mana ##s air base in kyrgyzstan to [provide] support to coalition aircraft operating in afghan [waters] [but] were later withdrawn . [the] [replica] [forces] task group was deployed to support the reconstruction task ##force in [the] [2007] . in addition to radar [,] [,] logistics [,] intelligence officers [,] and security personnel , [australia] brought [the] [number] of [australian] personnel in [afghanistan] to 950 by mid @ - @ 2007 [,] [,] further small [numbers] to 1 @ [,] @ 000 in mid @ [-] @ 2008 , 1 @ , @ [000] [in] early 2009 [,] 1 @ , [and] 550 in mid [@] - @ 2009 .
batch_tfm = get_blurr_tfm(learn.dls.before_batch)

Learner.blurr_fill_mask works here too

learn.blurr_fill_mask(f'Blurr is a {batch_tfm.hf_tokenizer.mask_token}.', n_preds=5)
['Blurr is a cinematographer.',
 'Blurr is a runner.',
 'Blurr is a pseudonym.',
 'Blurr is a shooter.',
 'Blurr is a dog.']

Summarization

raw_datasets = load_dataset("cnn_dailymail", '3.0.0')
print(f'{raw_datasets}\n')
print(f'{raw_datasets["train"][0]}\n')
print(f'{raw_datasets["train"].features}\n')
Reusing dataset cnn_dailymail (/home/wgilliam/.cache/huggingface/datasets/cnn_dailymail/3.0.0/3.0.0/3cb851bf7cf5826e45d49db2863f627cba583cbc32342df7349dfe6c38060234)
DatasetDict({
    train: Dataset({
        features: ['article', 'highlights', 'id'],
        num_rows: 287113
    })
    validation: Dataset({
        features: ['article', 'highlights', 'id'],
        num_rows: 13368
    })
    test: Dataset({
        features: ['article', 'highlights', 'id'],
        num_rows: 11490
    })
})

{'article': 'It\'s official: U.S. President Barack Obama wants lawmakers to weigh in on whether to use military force in Syria. Obama sent a letter to the heads of the House and Senate on Saturday night, hours after announcing that he believes military action against Syrian targets is the right step to take over the alleged use of chemical weapons. The proposed legislation from Obama asks Congress to approve the use of military force "to deter, disrupt, prevent and degrade the potential for future uses of chemical weapons or other weapons of mass destruction." It\'s a step that is set to turn an international crisis into a fierce domestic political battle. There are key questions looming over the debate: What did U.N. weapons inspectors find in Syria? What happens if Congress votes no? And how will the Syrian government react? In a televised address from the White House Rose Garden earlier Saturday, the president said he would take his case to Congress, not because he has to -- but because he wants to. "While I believe I have the authority to carry out this military action without specific congressional authorization, I know that the country will be stronger if we take this course, and our actions will be even more effective," he said. "We should have this debate, because the issues are too big for business as usual." Obama said top congressional leaders had agreed to schedule a debate when the body returns to Washington on September 9. The Senate Foreign Relations Committee will hold a hearing over the matter on Tuesday, Sen. Robert Menendez said. Transcript: Read Obama\'s full remarks . Syrian crisis: Latest developments . U.N. inspectors leave Syria . Obama\'s remarks came shortly after U.N. inspectors left Syria, carrying evidence that will determine whether chemical weapons were used in an attack early last week in a Damascus suburb. "The aim of the game here, the mandate, is very clear -- and that is to ascertain whether chemical weapons were used -- and not by whom," U.N. spokesman Martin Nesirky told reporters on Saturday. But who used the weapons in the reported toxic gas attack in a Damascus suburb on August 21 has been a key point of global debate over the Syrian crisis. Top U.S. officials have said there\'s no doubt that the Syrian government was behind it, while Syrian officials have denied responsibility and blamed jihadists fighting with the rebels. British and U.S. intelligence reports say the attack involved chemical weapons, but U.N. officials have stressed the importance of waiting for an official report from inspectors. The inspectors will share their findings with U.N. Secretary-General Ban Ki-moon Ban, who has said he wants to wait until the U.N. team\'s final report is completed before presenting it to the U.N. Security Council. The Organization for the Prohibition of Chemical Weapons, which nine of the inspectors belong to, said Saturday that it could take up to three weeks to analyze the evidence they collected. "It needs time to be able to analyze the information and the samples," Nesirky said. He noted that Ban has repeatedly said there is no alternative to a political solution to the crisis in Syria, and that "a military solution is not an option." Bergen:  Syria is a problem from hell for the U.S. Obama: \'This menace must be confronted\' Obama\'s senior advisers have debated the next steps to take, and the president\'s comments Saturday came amid mounting political pressure over the situation in Syria. Some U.S. lawmakers have called for immediate action while others warn of stepping into what could become a quagmire. Some global leaders have expressed support, but the British Parliament\'s vote against military action earlier this week was a blow to Obama\'s hopes of getting strong backing from key NATO allies. On Saturday, Obama proposed what he said would be a limited military action against Syrian President Bashar al-Assad. Any military attack would not be open-ended or include U.S. ground forces, he said. Syria\'s alleged use of chemical weapons earlier this month "is an assault on human dignity," the president said. A failure to respond with force, Obama argued,  "could lead to escalating use of chemical weapons or their proliferation to terrorist groups who would do our people harm. In a world with many dangers, this menace must be confronted." Syria missile strike: What would happen next? Map: U.S. and allied assets around Syria . Obama decision came Friday night . On Friday night, the president made a last-minute decision to consult lawmakers. What will happen if they vote no? It\'s unclear. A senior administration official told CNN that Obama has the authority to act without Congress -- even if Congress rejects his request for authorization to use force. Obama on Saturday continued to shore up support for a strike on the al-Assad government. He spoke by phone with French President Francois Hollande before his Rose Garden speech. "The two leaders agreed that the international community must deliver a resolute message to the Assad regime -- and others who would consider using chemical weapons -- that these crimes are unacceptable and those who violate this international norm will be held accountable by the world," the White House said. Meanwhile, as uncertainty loomed over how Congress would weigh in, U.S. military officials said they remained at the ready. 5 key assertions: U.S. intelligence report on Syria . Syria: Who wants what after chemical weapons horror . Reactions mixed to Obama\'s speech . A spokesman for the Syrian National Coalition said that the opposition group was disappointed by Obama\'s announcement. "Our fear now is that the lack of action could embolden the regime and they repeat his attacks in a more serious way," said spokesman Louay Safi. "So we are quite concerned." Some members of Congress applauded Obama\'s decision. House Speaker John Boehner, Majority Leader Eric Cantor, Majority Whip Kevin McCarthy and Conference Chair Cathy McMorris Rodgers issued a statement Saturday praising the president. "Under the Constitution, the responsibility to declare war lies with Congress," the Republican lawmakers said. "We are glad the president is seeking authorization for any military action in Syria in response to serious, substantive questions being raised." More than 160 legislators, including 63 of Obama\'s fellow Democrats, had signed letters calling for either a vote or at least a "full debate" before any U.S. action. British Prime Minister David Cameron, whose own attempt to get lawmakers in his country to support military action in Syria failed earlier this week, responded to Obama\'s speech in a Twitter post Saturday. "I understand and support Barack Obama\'s position on Syria," Cameron said. An influential lawmaker in Russia -- which has stood by Syria and criticized the United States -- had his own theory. "The main reason Obama is turning to the Congress:  the military operation did not get enough support either in the world, among allies of the US or in the United States itself," Alexei Pushkov, chairman of the international-affairs committee of the Russian State Duma, said in a Twitter post. In the United States, scattered groups of anti-war protesters around the country took to the streets Saturday. "Like many other Americans...we\'re just tired of the United States getting involved and invading and bombing other countries," said Robin Rosecrans, who was among hundreds at a Los Angeles demonstration. What do Syria\'s neighbors think? Why Russia, China, Iran stand by Assad . Syria\'s government unfazed . After Obama\'s speech, a military and political analyst on Syrian state TV said Obama is "embarrassed" that Russia opposes military action against Syria, is "crying for help" for someone to come to his rescue and is facing two defeats -- on the political and military levels. Syria\'s prime minister appeared unfazed by the saber-rattling. "The Syrian Army\'s status is on maximum readiness and fingers are on the trigger to confront all challenges," Wael Nader al-Halqi said during a meeting with a delegation of Syrian expatriates from Italy, according to a banner on Syria State TV that was broadcast prior to Obama\'s address. An anchor on Syrian state television said Obama "appeared to be preparing for an aggression on Syria based on repeated lies." A top Syrian diplomat told the state television network that Obama was facing pressure to take military action from Israel, Turkey, some Arabs and right-wing extremists in the United States. "I think he has done well by doing what Cameron did in terms of taking the issue to Parliament," said Bashar Jaafari, Syria\'s ambassador to the United Nations. Both Obama and Cameron, he said, "climbed to the top of the tree and don\'t know how to get down." The Syrian government has denied that it used chemical weapons in the August 21 attack, saying that jihadists fighting with the rebels used them in an effort to turn global sentiments against it. British intelligence had put the number of people killed in the attack at more than 350. On Saturday, Obama said "all told, well over 1,000 people were murdered." U.S. Secretary of State John Kerry on Friday cited a death toll of 1,429, more than 400 of them children. No explanation was offered for the discrepancy. Iran: U.S. military action in Syria would spark \'disaster\' Opinion: Why strikes in Syria are a bad idea .', 'highlights': 'Syrian official: Obama climbed to the top of the tree, "doesn\'t know how to get down"\nObama sends a letter to the heads of the House and Senate .\nObama to seek congressional approval on military action against Syria .\nAim is to determine whether CW were used, not by whom, says U.N. spokesman .', 'id': '0001d1afc246a7964130f43ae940af6bc6c57f01'}

{'article': Value(dtype='string', id=None), 'highlights': Value(dtype='string', id=None), 'id': Value(dtype='string', id=None)}

train_ds = raw_datasets['train'].select(range(1000))
valid_ds = raw_datasets['validation'].select(range(1000))
n_train, n_valid = train_ds.num_rows, valid_ds.num_rows
train_idxs, valid_idxs = L(range(n_train)), L(range(n_train, n_train + n_valid))
raw_ds = concatenate_datasets([train_ds, valid_ds])
learn = BlearnerForSummarization.from_dictionaries(raw_ds, 'facebook/bart-large-cnn', 
                                                text_attr='article', summary_attr='highlights', 
                                                max_length=256, max_target_length=130,
                                                dblock_splitter=IndexSplitter(valid_idxs),
                                                dl_kwargs={'bs':2}).to_fp16()
learn.dls.show_batch(dataloaders=learn.dls, max_n=2, input_trunc_at=500, target_trunc_at=250)
text target
0 <s> (CNN) -- When Ji Yeqing awakened, she was already in the recovery room. Chinese authorities had dragged her out of her home and down four flights of stairs, she said, restraining and beating her husband as he tried to come to her aid. They whisked her into a clinic, held her down on a bed and forced her to undergo an abortion. Her offense? Becoming pregnant with a second child, in violation of China's one-child policy. "After the abortion, I felt empty, as if something was scooped out of me, China's one-child policy results in forced abortions and sterilizations, activists say.\nWomen tell of emotional and physical consequences from the procedures.\nActivist Chen Guangcheng works to advocate for victims of such practices.
1 <s> (CNN) -- Sitting incongruously among the hangars and laboratories of NASA's Ames Research Center in Silicon Valley is the squat facade of an old McDonald's. You won't get a burger there, though -- its cash registers and soft-serve machines have given way to old tape drives and modern computers run by a rogue team of hacker engineers who've rechristened the place McMoon's. These self-described techno-archaeologists have been on a mission to recover and digitize forgotten photos taken in the ' NASA-funded project has recovered 2,000 analog moon pictures.\nThe images were taken by the five Lunar Orbiter images between 1966 and 1967.\nProject uses old and modern technology to produce high-res copies of the originals.
metrics_cb = BlearnerForSummarization.get_metrics_cb()
learn.fit_one_cycle(1, lr_max=4e-5, cbs=[metrics_cb])
epoch train_loss valid_loss rouge1 rouge2 rougeL bertscore_precision bertscore_recall bertscore_f1 time
0 1.713355 1.822389 0.340577 0.141342 0.241867 0.869194 0.891537 0.880114 11:55
learn.show_results(learner=learn, max_n=2, input_trunc_at=500, target_trunc_at=250)
text target prediction
0 (CNN)Reading the headlines out of Madison, Wisconsin, it's hard not to think about Ferguson, Missouri. But law enforcement's response to the shooting of 19-year-old Tony Robinson will not unfold in the same chaotic, violent and distrusting way as the shooting of 18-year-old Michael Brown, Madison's top police leaders vowed. "I think it's very clear that Madison, Wisconsin, is not Ferguson, Missouri," said Jim Palmer, the executive director of the Wisconsin Professional Police Association. The h Police officials in Madison say their responses to shooting by officer reflect their role in community.\nOne example: Madison chief talked to teen's family soon after shooting.\nA month went by before Ferguson chief apologized to Brown's family. Madison, Wisconsin, police chief: "We have to show affirmative steps in moving forward to bring community back into the fold"\n"I think it's very clear that Madison ... is not Ferguson, Missouri," said Jim Palmer, the executive director of the Wiscon
1 (CNN)More than two decades as a judge, prosecutor and defense lawyer could not prepare Susan Criss for the Texas murder trial of millionaire Robert Durst. The aftermath of the sensational 2003 trial of the scion of a New York real estate empire in many ways upended the life of the 54-year-old Galveston County-born lawyer who presided over the case. Durst admitted at trial that he killed neighbor Morris Black in Galveston and chopped up the body. There was the awkward encounter with Durst in an Former Texas judge says Durst case "affected me in many, many ways"\nDurst is charged with first-degree murder in the slaying of his longtime friend in 2000. Susan Criss served as judge in the 2003 murder trial of Robert Durst .\nCriss says she believes Durst was behind the cat killing, but admits police found no evidence .\nDurst admitted at trial that he killed neighbor Morris Black in Galveston, Texas,

Learner.blurr_generate works here too

test_article = """
About 10 men armed with pistols and small machine guns raided a casino in Switzerland and made off 
into France with several hundred thousand Swiss francs in the early hours of Sunday morning, police said. 
The men, dressed in black clothes and black ski masks, split into two groups during the raid on the Grand Casino 
Basel, Chief Inspector Peter Gill told CNN. One group tried to break into the casino's vault on the lower level 
but could not get in, but they did rob the cashier of the money that was not secured, he said. The second group 
of armed robbers entered the upper level where the roulette and blackjack tables are located and robbed the 
cashier there, he said. As the thieves were leaving the casino, a woman driving by and unaware of what was 
occurring unknowingly blocked the armed robbers' vehicles. A gunman pulled the woman from her vehicle, beat 
her, and took off for the French border. The other gunmen followed into France, which is only about 100 
meters (yards) from the casino, Gill said. There were about 600 people in the casino at the time of the robbery. 
There were no serious injuries, although one guest on the Casino floor was kicked in the head by one of the 
robbers when he moved, the police officer said. Swiss authorities are working closely with French authorities, 
Gill said. The robbers spoke French and drove vehicles with French lRicense plates. CNN's Andreena Narayan 
contributed to this report.
"""
outputs = learn.blurr_generate(test_article, num_return_sequences=3)

for idx, o in enumerate(outputs):
    print(f'=== Prediction {idx+1} ===\n{o}\n')
=== Prediction 1 ===
 A woman driving by unknowingly blocks the robbers' vehicles as they were leaving the casino .
A gunman pulls the woman from her vehicle, beat  her, and takes off for the French border .
The robbers spoke French and drove vehicles with French lRicense plates .
There were no serious injuries, although one guest was kicked in the head by one of the robbers .

=== Prediction 2 ===
 A woman driving by unknowingly blocks the robbers' vehicles as they were leaving the casino .
A gunman pulls the woman from her vehicle, beat her and takes off for the French border .
The robbers spoke French and drove vehicles with French lRicense plates, police say .
There were about 600 people in the casino at the time of the robbery, a police officer says .

=== Prediction 3 ===
 A woman driving by unknowingly blocks the robbers' vehicles as they were leaving the casino .
A gunman pulls the woman from her vehicle, beat  her, and takes off for the French border .
The robbers spoke French and drove vehicles with French lRicense plates .
There were about 600 people in the casino at the time of the robbery .

Translation

raw_datasets = load_dataset('wmt16', 'de-en')
print(f'{raw_datasets}\n')
print(f'{raw_datasets["train"][0]}\n')
print(f'{raw_datasets["train"].features}\n')
Reusing dataset wmt16 (/home/wgilliam/.cache/huggingface/datasets/wmt16/de-en/1.0.0/0d9fb3e814712c785176ad8cdb9f465fbe6479000ee6546725db30ad8a8b5f8a)
DatasetDict({
    train: Dataset({
        features: ['translation'],
        num_rows: 4548885
    })
    validation: Dataset({
        features: ['translation'],
        num_rows: 2169
    })
    test: Dataset({
        features: ['translation'],
        num_rows: 2999
    })
})

{'translation': {'de': 'Wiederaufnahme der Sitzungsperiode', 'en': 'Resumption of the session'}}

{'translation': Translation(languages=['de', 'en'], id=None)}

train_ds = raw_datasets['train'].select(range(1000))
valid_ds = raw_datasets['validation'].select(range(1000))
n_train, n_valid = train_ds.num_rows, valid_ds.num_rows
train_idxs, valid_idxs = L(range(n_train)), L(range(n_train, n_train + n_valid))
raw_ds = concatenate_datasets([train_ds, valid_ds])
def make_dict(item):
    return item['translation']

raw_ds = raw_ds.map(make_dict)
Loading cached processed dataset at /home/wgilliam/.cache/huggingface/datasets/wmt16/de-en/1.0.0/0d9fb3e814712c785176ad8cdb9f465fbe6479000ee6546725db30ad8a8b5f8a/cache-2d203da04becbf79.arrow
learn = BlearnerForTranslation.from_dataframe(raw_ds, 'Helsinki-NLP/opus-mt-de-en', 
                                              src_lang_name='German', src_lang_attr='de', 
                                              trg_lang_name='English', trg_lang_attr='en', 
                                              dblock_splitter=RandomSplitter(),
                                              dl_kwargs={'bs':2}).to_fp16()
learn.dls.show_batch(dataloaders=learn.dls, max_n=2, input_trunc_at=500, target_trunc_at=250)
text target
0 ▁Angesichts▁dieser Situation▁muß▁aus dem▁Bericht, den das▁Parlament annimmt,▁klar▁hervorgehen,▁daß▁Maßnahmen▁notwendig▁sind, die▁eindeutig auf die▁Bekämpfung der relativen▁Armut und der Arbeitslosigkeit▁gerichtet▁sind.▁Maßnahmen▁wie die für diese▁Zwecke▁angemessene▁Verwendung der▁Strukturfonds, die▁häufig▁unsachgemäß▁eingesetzt▁werden, und▁zwar mit▁zentralen▁staatlichen▁Politiken, die▁Modernisierung der▁Bereiche Telekommunikation und▁Kommunikation,▁indem man vor▁allem die am▁wenigsten▁entwickelt Given this situation, the report approved by Parliament must highlight the need for measures that aim unequivocally to fight relative poverty and unemployment: measures such as the appropriate use of structural funds for these purposes, which are oft
1 Wir▁sollten▁auch die▁Bescheidenheit aufbringen einzusehen,▁daß wir nicht▁erst eine▁Woche vor der▁eigentlichen▁Debatte in▁diesem Haus die▁Mechanismen▁einrichten▁können, die▁erforderlich▁sind, um eine▁strategische▁Debatte▁durchzuführen, die sich nicht▁nur auf eine▁Präsentation und▁Erläuterungen▁seitens des▁Präsidenten der▁Kommission▁beschränkt,▁sondern in der▁auch ein▁Fünfjahresprogramm▁vorgelegt▁wird.▁Nur so▁können wir der▁Kommission▁rechtzeitig▁unsere▁Wünsche▁übermitteln und diese▁entsprechend▁d We should also have the humility to recognise that, if we wanted to have a strategic debate accompanied not just by a presentation and elucidation by the President of the Commission, but also by a five-year programme, we should have the mechanisms in
metrics_cb = BlearnerForTranslation.get_metrics_cb()
learn.fit_one_cycle(1, lr_max=4e-5, cbs=[metrics_cb])
[nltk_data] Downloading package wordnet to /home/wgilliam/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
epoch train_loss valid_loss bleu meteor sacrebleu time
0 1.239250 1.231428 0.331382 0.548100 31.743439 02:04
learn.show_results(learner=learn, max_n=2, input_trunc_at=500, target_trunc_at=250)
text target prediction
0 ▁Deshalb▁besteht der▁Vorschlag der▁Fraktion der▁Sozialdemokratischen▁Partei▁Europas, den Sie▁erwähnt▁haben,▁darin, den▁Mittwoch▁als▁Termin der▁Vorstellung des▁Programms der▁Kommission Prodi für die▁Wahlperiode▁beizubehalten, und in▁dieses▁Programm▁auch das▁Verwaltungsreformprojekt▁einzubeziehen, da wir▁andernfalls in eine paradoxe Situation▁geraten▁könnten: Mit der Ausrede, der▁Wortlaut liege nicht vor,▁wird▁einerseits dem▁Präsidenten der▁Kommission das▁Recht▁abgesprochen, in▁diesem▁Parlament zu Therefore, the proposal of the Group of the Party of European Socialists, and which you have mentioned, is that the Prodi Commission present its legislative programme on Wednesday, including its proposed administrative reform, because, otherwise, we That is why the proposal of the Group of the Party of European Socialists, which you have mentioned, is to keep Wednesday as the date for the presentation of the Prodi Commission' s programme for the parliamentary term, and to include in this program
1 ▁Aufgrund▁meinen▁Vorstellungen vom▁Aufbau▁Europas und von▁regionaler▁Entwicklungspolitik im▁besonderen▁halte▁ich das für eine Situation, die▁ich nicht▁akzeptieren▁kann.▁Ich▁habe die▁Absicht, im▁Rahmen▁meiner▁Möglichkeiten und mit▁Ihrer▁Unterstützung▁sämtliche Mittel, für die▁ich▁Verantwortung▁trage, für eine▁verbesserte▁soziale,▁menschliche und▁territoriale▁Kohäsion zu▁verwenden, um zu▁verhindern,▁daß es,▁wie▁ich es vor▁diesem▁Hause▁nannte, ein Europa der▁zwei▁Geschwindigkeiten▁gibt, ein Europa As far as I am concerned - taking into account my own concept of the construction of Europe and regional development policy in particular - this is a situation which I find unacceptable and I have every intention, as far as possible, with your suppor On the basis of my ideas about the construction of Europe and of regional development policy in particular, I believe that this is a situation which I cannot accept, and I intend, within my possibilities and with your support, to use all the means fo

Learner.blurr_generate works here too

test_de = "Ich trinke gerne Bier"
learn.blurr_generate(test_de)
['I like drinking beer']

Summary

In summary, whether you want to work with Blurr's low, mid, or high-level API ... we got you covered :)