Using the high-level Blurr API

Show all of the high-level BlurrFor<Task> classes in action here with the raw data sourced from the Hugging Face Datasets library.

While most of the code and examples in the documentation show how to work with Blurr given a pandas Dataframe, these set of examples will show you how to use the high-level Blurr API with any Hugging Face dataset. The high-level API provides one liners to build your DataBlock, DataLoaders, and Learner (with sensible defaults) from a DataFrame, CSV file, or a list of dictionaries as we do so here.

Sequence Classification

Multiclassification (one input)

raw_datasets = load_dataset("glue", "cola")
print(f"{raw_datasets}\n")
print(f'{raw_datasets["train"][0]}\n')
print(f'{raw_datasets["train"].features}\n')

Reusing dataset glue (/home/wgilliam/.cache/huggingface/datasets/glue/cola/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad)

DatasetDict({
    train: Dataset({
        features: ['idx', 'label', 'sentence'],
        num_rows: 8551
    })
    validation: Dataset({
        features: ['idx', 'label', 'sentence'],
        num_rows: 1043
    })
    test: Dataset({
        features: ['idx', 'label', 'sentence'],
        num_rows: 1063
    })
})

{'idx': 0, 'label': 1, 'sentence': "Our friends won't buy this analysis, let alone the next one we propose."}

{'idx': Value(dtype='int32', id=None), 'label': ClassLabel(num_classes=2, names=['unacceptable', 'acceptable'], names_file=None, id=None), 'sentence': Value(dtype='string', id=None)}

Capture the indexes for both train and validation sets, use the datasets concatenate_datasets to put them into a single dataset, and finally use the IndexSplitter method to define our train/validation splits as such:

train_ds = raw_datasets["train"]  # .select(range(10000))
valid_ds = raw_datasets["validation"]  # .select(range(2000))

n_train, n_valid = train_ds.num_rows, valid_ds.num_rows
train_idxs, valid_idxs = L(range(n_train)), L(range(n_train, n_train + n_valid))
raw_ds = concatenate_datasets([train_ds, valid_ds])

dl_kwargs = {"bs": 4, "val_bs": 8}
learn_kwargs = {"metrics": [accuracy]}

learn = BlearnerForSequenceClassification.from_data(
    raw_ds,
    "distilroberta-base",
    text_attr="sentence",
    label_attr="label",
    dblock_splitter=IndexSplitter(valid_idxs),
    dl_kwargs=dl_kwargs,
    learner_kwargs=learn_kwargs,
)
learn = learn.to_fp16()

learn.dls.show_batch(dataloaders=learn.dls, trunc_at=500, max_n=5)

	text	target
0	Everybody who has ever, worked in any office which contained any typewriter which had ever been used to type any letters which had to be signed by any administrator who ever worked in any department like mine will know what I mean.	1
1	I watched the Indians who the man who had been my advisor in my freshman year had advised me to study when I got to Utah talk.	0
2	Which packages is it possible that Sam didn't pick up which are to be mailed tomorrow until it had stopped raining?	0
3	Willy is taller than Bill by as much as that Bill is taller than Dan is believed.	0

learn.fit_one_cycle(1, lr_max=2e-3)

epoch	train_loss	valid_loss	accuracy	time
0	0.477010	0.519977	0.746884	00:53

learn.show_results(learner=learn, max_n=5)

	text	target	prediction
0	Scientists at the South Hanoi Institute of Technology have succeeded in raising one dog with five legs, another with a cow's liver, and a third with no head.	1	1
1	The newspaper has reported that they are about to appoint someone, but I can't remember who the newspaper has reported that they are about to appoint.	1	1
2	Sandy is very anxious to see if the students will be able to solve the homework problem in a particular way, but she won't tell us in which way.	1	1
3	Sandy is very anxious to see if the students will be able to solve the homework problem in a particular way, but she won't tell us which.	1	1
4	Put a picture of Bill on your desk before tomorrow, this girl in the red coat will put a picture of Bill on your desk before tomorrow.	0	1

Learner.blurr_predict works here too

learn.blurr_predict("Blurr aint no joke yo")

[{'label': '0',
  'score': 0.6082450747489929,
  'class_index': 0,
  'class_labels': [0, 1],
  'probs': [0.6082450747489929, 0.3917549252510071]}]

Multiclassification (two inputs)

raw_datasets = load_dataset("glue", "mrpc")
print(f"{raw_datasets}\n")
print(f'{raw_datasets["train"][0]}\n')
print(f'{raw_datasets["train"].features}\n')

Reusing dataset glue (/home/wgilliam/.cache/huggingface/datasets/glue/mrpc/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad)

DatasetDict({
    train: Dataset({
        features: ['idx', 'label', 'sentence1', 'sentence2'],
        num_rows: 3668
    })
    validation: Dataset({
        features: ['idx', 'label', 'sentence1', 'sentence2'],
        num_rows: 408
    })
    test: Dataset({
        features: ['idx', 'label', 'sentence1', 'sentence2'],
        num_rows: 1725
    })
})

{'idx': 0, 'label': 1, 'sentence1': 'Amrozi accused his brother , whom he called " the witness " , of deliberately distorting his evidence .', 'sentence2': 'Referring to him as only " the witness " , Amrozi accused his brother of deliberately distorting his evidence .'}

{'idx': Value(dtype='int32', id=None), 'label': ClassLabel(num_classes=2, names=['not_equivalent', 'equivalent'], names_file=None, id=None), 'sentence1': Value(dtype='string', id=None), 'sentence2': Value(dtype='string', id=None)}

train_ds = raw_datasets["train"]  # .select(range(10000))
valid_ds = raw_datasets["validation"]  # .select(range(2000))

n_train, n_valid = train_ds.num_rows, valid_ds.num_rows
train_idxs, valid_idxs = L(range(n_train)), L(range(n_train, n_train + n_valid))
raw_ds = concatenate_datasets([train_ds, valid_ds])

dl_kwargs = {"bs": 4, "val_bs": 8}
learn_kwargs = {"metrics": [F1Score(), accuracy]}

learn = BlearnerForSequenceClassification.from_data(
    raw_ds,
    "distilroberta-base",
    text_attr=["sentence1", "sentence2"],
    label_attr="label",
    dblock_splitter=IndexSplitter(valid_idxs),
    dl_kwargs=dl_kwargs,
    learner_kwargs=learn_kwargs,
)
learn = learn.to_fp16()

learn.dls.show_batch(dataloaders=learn.dls, trunc_at=500, max_n=5)

	text	target
0	" In Iraq, " Sen. Pat Roberts, R-Kan., chairman of the intelligence committee, said on CNN's " Late Edition " Sunday, " we're now fighting an anti-guerrilla... effort. " " In Iraq, " Sen. Pat Roberts ( R-Kan. ), chairman of the intelligence committee, said on CNN's " Late Edition " yesterday, " we're now fighting an anti-guerrilla... effort. "	1
1	Media moguls jostled for position as the deadline for bids for Vivendi Universal's U.S. entertainment empire neared on Monday in an auction of some of Hollywood's best-known assets. Media giant Vivendi Universal has given itself two weeks to sift through offers for its U.S. entertainment empire in a multi-billion dollar auction of some of Hollywood's best-known assets.	1
2	Against the dollar, the euro rose as high as $ 1.1535 -- a fresh four-year high -- in morning trade before standing at $ 1.1518 / 23 at 0215 GMT. Against the dollar, the euro rose as high as $ 1.1537, a fresh four-year high and up a half cent from around $ 1.1480 in late U.S. trade.	0
3	Under the NBC proposal, Vivendi would merge its U.S. film and TV business with NBC's broadcast network, Spanish-language network and cable channels including CNBC and Bravo. Under a deal with General Electric's NBC, Vivendi's film and TV business would merge with NBC's broadcast network, Spanish- language network and cable channels including CNBC and Bravo.	1

learn.fit_one_cycle(1, lr_max=2e-3)

epoch	train_loss	valid_loss	f1_score	accuracy	time
0	0.489009	0.444209	0.861386	0.794118	00:22

learn.show_results(learner=learn, max_n=5)

	text	target	prediction
0	He said the foodservice pie business doesn 't fit the company's long-term growth strategy. " The foodservice pie business does not fit our long-term growth strategy.	1	1
1	" Close co-operation between our law enforcement agencies, close co-operation between our intelligence services lie at the heart of the ongoing fight against terrorism. " Close cooperation between regional law enforcement agencies and intelligence services was at the heart of the fight against terrorism, he said.	1	1
2	They were being held Sunday in the Camden County Jail on $ 100,000 bail. They remained in Camden County Jail on Sunday on $ 100,000 bail.	1	1
3	Sales for the quarter beat expectations, rising 37 percent year-on-year to 1.76 billion euros. Sales rose 37 per cent year-on-year to 1.76bn, beating expectations.	1	1
4	ONG KONG, July 9 Tens of thousands of demonstrators gathered tonight before the legislature building here to call for free elections and the resignation of Hong Kong's leader. Tens of thousands of demonstrators gathered yesterday evening to stand before this city's legislature building and call for free elections and the resignation of Hong Kong's leader.	1	1

Multilabel classification

raw_datasets = load_dataset("civil_comments")
print(f"{raw_datasets}\n")
print(f'{raw_datasets["train"][0]}\n')
print(f'{raw_datasets["train"].features}\n')

Using custom data configuration default
Reusing dataset civil_comments (/home/wgilliam/.cache/huggingface/datasets/civil_comments/default/0.9.0/e7a3aacd2ab7d135fa958e7209d10b1fa03807d44c486e3c34897aa08ea8ffab)

DatasetDict({
    train: Dataset({
        features: ['identity_attack', 'insult', 'obscene', 'severe_toxicity', 'sexual_explicit', 'text', 'threat', 'toxicity'],
        num_rows: 1804874
    })
    validation: Dataset({
        features: ['identity_attack', 'insult', 'obscene', 'severe_toxicity', 'sexual_explicit', 'text', 'threat', 'toxicity'],
        num_rows: 97320
    })
    test: Dataset({
        features: ['identity_attack', 'insult', 'obscene', 'severe_toxicity', 'sexual_explicit', 'text', 'threat', 'toxicity'],
        num_rows: 97320
    })
})

{'identity_attack': 0.0, 'insult': 0.0, 'obscene': 0.0, 'severe_toxicity': 0.0, 'sexual_explicit': 0.0, 'text': "This is so cool. It's like, 'would you want your mother to read this??' Really great idea, well done!", 'threat': 0.0, 'toxicity': 0.0}

{'identity_attack': Value(dtype='float32', id=None), 'insult': Value(dtype='float32', id=None), 'obscene': Value(dtype='float32', id=None), 'severe_toxicity': Value(dtype='float32', id=None), 'sexual_explicit': Value(dtype='float32', id=None), 'text': Value(dtype='string', id=None), 'threat': Value(dtype='float32', id=None), 'toxicity': Value(dtype='float32', id=None)}

lbl_cols = ["identity_attack", "insult", "obscene", "toxicity", "severe_toxicity", "sexual_explicit", "threat"]

train_ds = raw_datasets["train"].select(range(10000))
valid_ds = raw_datasets["validation"].select(range(2000))

n_train, n_valid = len(train_ds), len(valid_ds)
train_idxs, valid_idxs = L(range(n_train)), L(range(n_train, n_train + n_valid))
raw_ds = concatenate_datasets([train_ds, valid_ds])

The labels need to be OHE as ints (the raw data has them as floats). We could also do this kind of preprocessing by passing in a preprocess_func to our BlearnerForSequenceClassification factory method, especially useful if such preprocessing depends on one or more of the Hugging Face objects (e.g., config, tokenizer, model, architecture)

def make_ohe(item):
    for k in item.keys():
        if k in lbl_cols:
            item[k] = int(np.round(item[k]))
    return item


raw_ds = raw_ds.map(make_ohe)

dl_kwargs = {"bs": 4, "val_bs": 8}
learn_kwargs = {"metrics": [F1ScoreMulti(), accuracy_multi]}

# using a List[dict] such as a Hugging Face dataset
learn = BlearnerForSequenceClassification.from_data(
    raw_ds,
    "distilroberta-base",
    text_attr="text",
    label_attr=lbl_cols,
    dblock_splitter=IndexSplitter(valid_idxs),
    dl_kwargs=dl_kwargs,
    learner_kwargs=learn_kwargs,
)
learn = learn.to_fp16()

learn.dls.show_batch(dataloaders=learn.dls, trunc_at=500, max_n=5)

	text	target
0	I have had a question about Einstein's Special Theory of Relativity for some time which scientists all seem to run away from. Until 1887 the equations used for Relativity were the Galilean transformation equations.\n\n x'=x-vt\n y'=y\n z'=z\n t'=t\n\nAfter 1887, scientists threw away the Galilean transfo	[]
1	To Maintain The Status Quo Of A 12% Road Repair Deficit Over The Next Five Years\nFY 16/17 = $11,400,000 + $43,600,000 = $55,00,000\nFY 17/18 = $11,400,000 + $43,600,000 = $55,00,000\nFY 18/19 = $11,400,000 + $43,600,000 = $55,00,000\nFY 19/20 = $11,400,000 + $43,600,000 = $55,00,000\nFY 20/21 = $11,400,000 + $43,600,000 = $55,00,000\nTotal over 5 years = $275,000,000\n\n1. The Portland city council shall adopt a five year road repair budget agreement that uses funds already available in the general fu	[]
2	"Voodoo Journalism: Dr. Krugman Strikes Again—Risking His Credibility":\n\n"The fact that Sanders’ ethical platform does, in fact, result in economic gains shouldn’t be surprising. Empirical evidence shows that the 3 times we’ve adopted a laisez-fair approach to regulating the economy, it has resulted in extreme income inequality leading inevitably to the 3 biggest economic disruptions in US history: the Depression in 1890’s, the Great Depression in 1930’s, and the Great Recession in 2008. This s	[]
3	Again, interesting. First, I don’t think anyone in this discussion has self-defined as “far left.” I would, for myself, as I believe in universal health care, nationalizing all utilities—including oil companies and big banks, free education (PK-grad), and 1950s tax rates (91% marginal tax rate is about right). NOBODY in Congress is proposing any of those, so there really is no “far left” representation in Congress.\nRe the marriage equality issue. What is RIGHT with it is that it guarantees	[]

learn.fit_one_cycle(1, lr_max=2e-3)

epoch	train_loss	valid_loss	f1_score	accuracy_multi	time
0	0.029894	0.047098	0.098025	0.986285	01:10

/home/wgilliam/miniconda3/envs/blurr/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1580: UndefinedMetricWarning: F-score is ill-defined and being set to 0.0 in labels with no true nor predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, "true nor predicted", "F-score is", len(true_sum))

learn.show_results(learner=learn, trun_at=500, max_n=5)

/home/wgilliam/miniconda3/envs/blurr/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1580: UndefinedMetricWarning: F-score is ill-defined and being set to 0.0 in labels with no true nor predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, "true nor predicted", "F-score is", len(true_sum))

	text	target	prediction
0	Everyone tries to hack everyone else. I have no doubt Russia would try to hack even canada. However, the US has been doing the same, if we recall Snowden.\n\nEven Merkel's phone conversations were being tapped by the CIA. \n\nThe real purpose of this issue is political. Trump is upset because people are trying to imply that he didn't deserve his victory, that the Russians helped him. It's an ego thing. Good CEOs sometimes have giant egos. I have no problem with that as long as they produce results, I gladly buy shares in their company.\n\nOtoh, Russia did invade Crimea recently, and their missile brought down a commercial airliner and killed lots of innocent people. The world has a right to be annoyed at the Russians.\n\nIf you want to find evidence of Russians hacking, you will find them. But if you want to find China or some guy in a basement somewhere, I have no doubt you can find the same as well. Whether they succeeded or not, that's hard to prove, but there's lots of blackhats	[]	[]
1	"We will stand by the Governor as he searches for answers to the crime wave." says Senator Kelly.\n Where the heck has Kelly been the last three years as the crime wave grew?. And just what is your job Kelly?..if the Governor is doing the one doing the searching? Oh I remember,..it`s to continue to be against broad-based taxes once again for Alaska, for any reasonable fix he comes up with, as your Senate caucus has said for four special sessions "there will be no tax bill to raise revenue" for whatever "fixes" you say we need, and that the Governor searches,,..without you apparently. They say in the media your back-pedaling now, that your now willing to have the debate over the need for a tax to put this state on a balanced keel going forward. We`ll see if your oily conflicted fellow senators agree. Voters are watching, and want a plan to fix this crime/budget issue. If it takes more cops and new taxes to get it done then let's do it. We had a tax before and nobody died from it.	[]	[]
2	Mr. Alali, I am sympathetic to your position and feelings. As a Canadian I hold no ill will towards you or your family relocating to Canada. You should be aware that you and your family have been used as political pawns following the glib and ill conceived election promise made by our Prime Minister to bring 25 thousand of your compatriots to Canada by the end of 2015. It sounds as if Mr McCallum and assistants scoured UN refugee lists in an effort to press gang hapless individuals and coerce them into settling and being shipped to Canada. The expedition of your arrival was made with no regard to the logistics of accommodating vulnerable and traumatized families in a respectful and decent manner following your staged and publicized arrivals. As for your future in Canada I fear you will be lucky to find some subsistence level employment. The chances are that your children, if you allow them to assimilate into the Canadian culture, will thrive and have a rewarding life in here. Good Luck	[]	[]
3	I abjure violence of any kind and this includes violence propagated through money by those who have the means to do so. I believe in free speech except when used to incite violence or hatred. I believe in the right of individual freedoms providing harm to others is not caused.. This is my bias.\n\nOthers believe an economic and social order that favours a few is natural to human nature and that those who can gain advantage, without consideration of harm to others, should be allowed to do so. That is their bias.\n\nEither view will result in bad behaviour by either side depending on who is ascendant. Trump was not against suppressing free speech or condoning violence as one could see during his campaign rallies.\n\nThe world will never be perfect or fair but beginning with Roosevelt and ending with Reagan there was a time when average people could see a slow but steady increase in living standards and opportunity. The rise of neo-liberal policies has removed this expectation for most.	[]	[]
4	Regarding "Lonely Woman" Yes, involve herself in organizations but perhaps not a church. I enjoy my church and it is important but... Many have gossips. When living in Denver my best friend Linda and I attended together for years and I was a maid of honer in her wedding. When they moved, I attended alone and some wives began speculating I was looking for a husband. Some of the single guys heard variations of that speculation. During service a guy slid across to me and began flirting. I whispered loud enough for all around me to hear, "Who are you! Get lost!" The minister heard. He took a second to force down a laugh and nodded at me. The guy slithered away. The gossips saw and were nice to me afterward but that was my last day. I still saw friends who assured me it was only a few bad apples. Point being to "Lonely Woman" Some churches have gossip mongrels who ruin it for young, decent women. Here's an oldie but goodie for you gossips. "Do not bare false witness against your neighbor."	[]	[]

Token Classification

raw_datasets = load_dataset("germeval_14")
print(f"{raw_datasets}\n")
print(f'{raw_datasets["train"][0]}\n')
print(f'{raw_datasets["train"].features}\n')

Reusing dataset germ_eval14 (/home/wgilliam/.cache/huggingface/datasets/germ_eval14/germeval_14/2.0.0/0f174b84866aa3b8ebae65c271610520be4422405d7e8467bd24cfd493d325f0)

DatasetDict({
    train: Dataset({
        features: ['id', 'ner_tags', 'nested_ner_tags', 'source', 'tokens'],
        num_rows: 24000
    })
    validation: Dataset({
        features: ['id', 'ner_tags', 'nested_ner_tags', 'source', 'tokens'],
        num_rows: 2200
    })
    test: Dataset({
        features: ['id', 'ner_tags', 'nested_ner_tags', 'source', 'tokens'],
        num_rows: 5100
    })
})

{'id': '0', 'ner_tags': [19, 0, 0, 0, 7, 0, 0, 0, 0, 19, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 'nested_ner_tags': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 'source': 'n-tv.de vom 26.02.2005 [2005-02-26] ', 'tokens': ['Schartau', 'sagte', 'dem', '"', 'Tagesspiegel', '"', 'vom', 'Freitag', ',', 'Fischer', 'sei', '"', 'in', 'einer', 'Weise', 'aufgetreten', ',', 'die', 'alles', 'andere', 'als', 'überzeugend', 'war', '"', '.']}

{'id': Value(dtype='string', id=None), 'ner_tags': Sequence(feature=ClassLabel(num_classes=25, names=['O', 'B-LOC', 'I-LOC', 'B-LOCderiv', 'I-LOCderiv', 'B-LOCpart', 'I-LOCpart', 'B-ORG', 'I-ORG', 'B-ORGderiv', 'I-ORGderiv', 'B-ORGpart', 'I-ORGpart', 'B-OTH', 'I-OTH', 'B-OTHderiv', 'I-OTHderiv', 'B-OTHpart', 'I-OTHpart', 'B-PER', 'I-PER', 'B-PERderiv', 'I-PERderiv', 'B-PERpart', 'I-PERpart'], names_file=None, id=None), length=-1, id=None), 'nested_ner_tags': Sequence(feature=ClassLabel(num_classes=25, names=['O', 'B-LOC', 'I-LOC', 'B-LOCderiv', 'I-LOCderiv', 'B-LOCpart', 'I-LOCpart', 'B-ORG', 'I-ORG', 'B-ORGderiv', 'I-ORGderiv', 'B-ORGpart', 'I-ORGpart', 'B-OTH', 'I-OTH', 'B-OTHderiv', 'I-OTHderiv', 'B-OTHpart', 'I-OTHpart', 'B-PER', 'I-PER', 'B-PERderiv', 'I-PERderiv', 'B-PERpart', 'I-PERpart'], names_file=None, id=None), length=-1, id=None), 'source': Value(dtype='string', id=None), 'tokens': Sequence(feature=Value(dtype='string', id=None), length=-1, id=None)}

train_ds = raw_datasets["train"].select(range(1000))
valid_ds = raw_datasets["validation"].select(range(500))

n_train, n_valid = train_ds.num_rows, valid_ds.num_rows
train_idxs, valid_idxs = L(range(n_train)), L(range(n_train, n_train + n_valid))
raw_ds = concatenate_datasets([train_ds, valid_ds])

We can grab the “labels” a token can be associated with as we do here or we can let the BlearnerForTokenClassification factory methods figure it out for us.

labels = train_ds.features["ner_tags"].feature.names
len(labels)

As we need pass the tag (not the index) for each example’s tokens in a list, we use the handy datasets.map function to create a new attribute, “token_labels”, with that data. This could also be done by passing in a preprocess_func to a BlearnerForTokenClassification factory method; especially useful if we need to use one or more of the Hugging Face objects (e.g., tokenzier, model, config, or architecture name)

def get_item_labels(example):
    example["token_labels"] = [labels[tag_idx] for tag_idx in example["ner_tags"]]
    return example


raw_ds = raw_ds.map(get_item_labels)

learn = BlearnerForTokenClassification.from_data(
    raw_ds,
    "distilroberta-base",
    tokens_attr="tokens",
    token_labels_attr="ner_tags",
    labels=labels,
    dl_kwargs={"bs": 2},
)

learn.unfreeze()

learn.dls.show_batch(dataloaders=learn.dls, max_n=2)

	word / target label
0	[('Helbig', 'B-OTH'), ('et', 'I-OTH'), ('al', 'I-OTH'), ('.', 'O'), ('(', 'O'), ('1994', 'O'), (')', 'O'), ('S.', 'O'), ('593.', 'O'), ('Wink', 'B-OTH'), ('&', 'I-OTH'), ('Seibold', 'I-OTH'), ('et', 'I-OTH'), ('al', 'I-OTH'), ('.', 'O'), ('(', 'O'), ('1998', 'O'), (')', 'O'), ('S.', 'O'), ('32', 'O'), ('Inwieweit', 'O'), ('noch', 'O'), ('andere', 'O'), ('Falken', 'O'), (',', 'O'), ('wie', 'O'), ('der', 'O'), ('Afrikanische', 'B-LOCderiv'), ('Baumfalke', 'O'), ('(', 'O'), ('Falco', 'O'), ('cuvieri', 'O'), (')', 'O'), ('oder', 'O'), ('der', 'O'), ('Malaienbaumfalke', 'O'), ('(', 'O'), ('Falco', 'O'), ('serverus', 'O'), (')', 'O'), ('dieser', 'O'), ('Gruppe', 'O'), ('zuzuzählen', 'O'), ('sind', 'O'), (',', 'O'), ('ist', 'O'), ('Gegenstand', 'O'), ('der', 'O'), ('Forschung', 'O'), ('.', 'O')]
1	[('Erstmals', 'O'), ('Urkundlich', 'O'), ('erwähnt', 'O'), ('ist', 'O'), ('Nimburg', 'B-LOC'), ('bereits', 'O'), ('im', 'O'), ('Jahre', 'O'), ('977', 'O'), ('.', 'O'), ('Im', 'O'), ('ausgehenden', 'O'), ('11.', 'O'), ('Jahrhundert', 'O'), ('werden', 'O'), ('die', 'O'), ('Grafen', 'O'), ('von', 'O'), ('Nimburg', 'B-LOC'), ('erwähnt', 'O'), (',', 'O'), ('die', 'O'), ('Gefolgsleute', 'O'), ('der', 'O'), ('in', 'O'), ('jener', 'O'), ('Zeit', 'O'), ('mächtigen', 'O'), ('Herzöge', 'O'), ('von', 'O'), ('Zähringen', 'O'), ('und', 'O'), ('unter', 'O'), ('anderem', 'O'), ('auch', 'O'), ('Teilnehmer', 'O'), ('der', 'O'), ('Kreuzzüge', 'O'), ('waren', 'O'), ('.', 'O')]

learn.fit_one_cycle(1, lr_max=3e-5, moms=(0.8, 0.7, 0.8), cbs=[BlearnerForTokenClassification.get_metrics_cb()])

epoch	train_loss	valid_loss	accuracy	precision	recall	f1	time
0	0.287480	0.260162	0.934187	0.422222	0.548077	0.476987	00:28

/home/wgilliam/miniconda3/envs/blurr/lib/python3.9/site-packages/seqeval/metrics/v1.py:57: UndefinedMetricWarning: Recall and F-score are ill-defined and being set to 0.0 in labels with no true samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, msg_start, len(result))

learn.show_results(learner=learn, max_n=2, trunc_at=10)

	token / target label / predicted label
0	[('Wenn', 'O', 'O'), ('man', 'O', 'O'), ('gegen', 'O', 'O'), ('so', 'O', 'O'), ('eine', 'O', 'O'), ('Mannschaft', 'O', 'O'), ('trifft', 'O', 'O'), (',', 'O', 'O'), ('hat', 'O', 'O'), ('man', 'O', 'O')]
1	[('Die', 'O', 'O'), ('Flügel', 'O', 'O'), ('Die', 'O', 'O'), ('geöffneten', 'O', 'O'), ('Flügel', 'O', 'O'), ('zeigen', 'O', 'O'), ('in', 'O', 'O'), ('vier', 'O', 'O'), ('Szenen', 'O', 'O'), ('Höhepunkte', 'O', 'O')]

print(learn.token_classification_report)

              precision    recall  f1-score   support

         LOC       0.62      0.51      0.56       129
    LOCderiv       0.49      0.72      0.58        25
     LOCpart       0.00      0.00      0.00         0
         ORG       0.22      0.27      0.24        60
     ORGpart       0.00      0.00      0.00         0
         OTH       0.04      0.67      0.08         3
    OTHderiv       0.00      0.00      0.00         0
     OTHpart       0.00      0.00      0.00         0
         PER       0.69      0.73      0.71        95
    PERderiv       0.00      0.00      0.00         0
     PERpart       0.00      0.00      0.00         0

   micro avg       0.42      0.55      0.48       312
   macro avg       0.19      0.26      0.20       312
weighted avg       0.55      0.55      0.54       312

Learner.blurr_predict_tokens works here too

txt = "I live in California, but I'd love to travel to Scotland and visit the Macallan distillery."
txt2 = "Jane Doe loves working for ohmeow.com."

results = learn.predict([txt, txt2])
for res in results:
    print(f"{res}\n")

[{'entity_group': 'LOC', 'score': 0.2773939371109009, 'word': 'in', 'start': 7, 'end': 9}, {'entity_group': 'LOC', 'score': 0.354379802942276, 'word': 'California', 'start': 10, 'end': 20}, {'entity_group': 'LOC', 'score': 0.2855808138847351, 'word': 'to', 'start': 45, 'end': 47}, {'entity_group': 'LOC', 'score': 0.41248050332069397, 'word': 'Scotland', 'start': 48, 'end': 56}, {'entity_group': 'LOC', 'score': 0.26863470673561096, 'word': 'Mac', 'start': 71, 'end': 74}, {'entity_group': 'LOC', 'score': 0.17475469410419464, 'word': 'all', 'start': 74, 'end': 77}, {'entity_group': 'LOC', 'score': 0.11444361507892609, 'word': 'an', 'start': 77, 'end': 79}]

[{'entity_group': 'PER', 'score': 0.6308206021785736, 'word': 'Jane Doe', 'start': 0, 'end': 8}]

Question Answering

raw_datasets = load_dataset("squad_v2")
print(f"{raw_datasets}\n")
print(f'{raw_datasets["train"][0]}\n')
print(f'{raw_datasets["train"].features}\n')

Reusing dataset squad_v2 (/home/wgilliam/.cache/huggingface/datasets/squad_v2/squad_v2/2.0.0/09187c73c1b837c95d9a249cd97c2c3f1cebada06efe667b4427714b27639b1d)

DatasetDict({
    train: Dataset({
        features: ['id', 'title', 'context', 'question', 'answers'],
        num_rows: 130319
    })
    validation: Dataset({
        features: ['id', 'title', 'context', 'question', 'answers'],
        num_rows: 11873
    })
})

{'id': '56be85543aeaaa14008c9063', 'title': 'Beyoncé', 'context': 'Beyoncé Giselle Knowles-Carter (/biːˈjɒnseɪ/ bee-YON-say) (born September 4, 1981) is an American singer, songwriter, record producer and actress. Born and raised in Houston, Texas, she performed in various singing and dancing competitions as a child, and rose to fame in the late 1990s as lead singer of R&B girl-group Destiny\'s Child. Managed by her father, Mathew Knowles, the group became one of the world\'s best-selling girl groups of all time. Their hiatus saw the release of Beyoncé\'s debut album, Dangerously in Love (2003), which established her as a solo artist worldwide, earned five Grammy Awards and featured the Billboard Hot 100 number-one singles "Crazy in Love" and "Baby Boy".', 'question': 'When did Beyonce start becoming popular?', 'answers': {'text': ['in the late 1990s'], 'answer_start': [269]}}

{'id': Value(dtype='string', id=None), 'title': Value(dtype='string', id=None), 'context': Value(dtype='string', id=None), 'question': Value(dtype='string', id=None), 'answers': Sequence(feature={'text': Value(dtype='string', id=None), 'answer_start': Value(dtype='int32', id=None)}, length=-1, id=None)}

train_ds = raw_datasets["train"].select(range(1000))
train_df = train_ds.to_pandas()

Extractive question/answering tasks require preprocessing, which we’ll apply prior to creating our BlearnerForQuestionAnswering.

train_df["ans_start_char_idx"] = train_df.answers.apply(lambda v: v["answer_start"][0] if len(v["answer_start"]) > 0 else "0")
train_df["answer_text"] = train_df.answers.apply(lambda v: v["text"][0] if len(v["text"]) > 0 else "")
train_df["ans_end_char_idx"] = train_df["ans_start_char_idx"].astype(int) + train_df["answer_text"].str.len()

from blurr.text.data.question_answering import QAPreprocessor

pretrained_model_name = "bert-large-uncased-whole-word-masking-finetuned-squad"
hf_tokenizer = AutoTokenizer.from_pretrained(pretrained_model_name)

# preprocess
tok_kwargs = {"return_overflowing_tokens": True, "max_length": 128, "stride": 24}
preprocessor = QAPreprocessor(hf_tokenizer, id_attr="id", tok_kwargs=tok_kwargs)
proc_df = preprocessor.process_df(train_df)

# build our `Learner`
learn = BlearnerForQuestionAnswering.from_data(
    proc_df, pretrained_model_name, max_seq_len=128, dblock_splitter=RandomSplitter(), dl_kwargs={"bs": 4}
)
learn = learn.to_fp16()

learn.dls.show_batch(dataloaders=learn.dls, max_n=2, trunc_at=500)

	text	found	start/end	answer
0	which prominent star felt the 2009 female video of the year award should have went to beyonce instead of taylor swift? on april 4, 2008, beyonce married jay z. she publicly revealed their marriage in a video montage at the listening party for her third studio album, i am... sasha fierce, in manhattan's sony club on october 22, 2008. i am... sasha fierce was released on november 18, 2008 in the united states. the album formally introduces beyonce's alter ego sasha fierce, conceived during the mak	True	(74, 76)	. i
1	for which decade, did beyonce have more top ten songs than any other woman? on april 4, 2008, beyonce married jay z. she publicly revealed their marriage in a video montage at the listening party for her third studio album, i am... sasha fierce, in manhattan's sony club on october 22, 2008. i am... sasha fierce was released on november 18, 2008 in the united states. the album formally introduces beyonce's alter ego sasha fierce, conceived during the making of her 2003 single " crazy in love ", s	False	(0, 0)

learn.fit_one_cycle(1, lr_max=1e-3)

epoch	train_loss	valid_loss	time
0	2.262114	1.950846	01:09

learn.show_results(learner=learn, skip_special_tokens=True, max_n=2, trunc_at=500)

	text	found	start/end	answer	pred start/end	pred answer
0	where was beyonce's first public performance after giving birth? on january 7, 2012, beyonce gave birth to her first child, a daughter, blue ivy carter, at lenox hill hospital in new york. five months later, she performed for four nights at revel atlantic city's ovation hall to celebrate the resort's opening, her first performances since giving birth to blue ivy.	True	(54, 63)	revel atlantic city's ovation hall	(54, 0)
1	what was the name of beyonce's first dance instructor? beyonce attended st. mary's elementary school in fredericksburg, texas, where she enrolled in dance classes. her singing talent was discovered when dance instructor darlette johnson began humming a song and she finished it, able to hit the high - pitched notes. beyonce's interest in music and performing continued after winning a school talent show at age seven, singing john lennon's " imagine " to beat 15 / 16 - year - olds. in fall of 1990,	False	(0, 0)		(44, 0)

Language modeling

raw_datasets = load_dataset("wikitext", "wikitext-2-raw-v1")
print(f"{raw_datasets}\n")
print(f'{raw_datasets["train"][0]}\n')
print(f'{raw_datasets["train"].features}\n')

Downloading and preparing dataset wikitext/wikitext-2-raw-v1 (download: 4.50 MiB, generated: 12.90 MiB, post-processed: Unknown size, total: 17.40 MiB) to /home/wgilliam/.cache/huggingface/datasets/wikitext/wikitext-2-raw-v1/1.0.0/a241db52902eaf2c6aa732210bead40c090019a499ceb13bcbfa3f8ab646a126...

Dataset wikitext downloaded and prepared to /home/wgilliam/.cache/huggingface/datasets/wikitext/wikitext-2-raw-v1/1.0.0/a241db52902eaf2c6aa732210bead40c090019a499ceb13bcbfa3f8ab646a126. Subsequent calls will reuse this data.

DatasetDict({
    test: Dataset({
        features: ['text'],
        num_rows: 4358
    })
    train: Dataset({
        features: ['text'],
        num_rows: 36718
    })
    validation: Dataset({
        features: ['text'],
        num_rows: 3760
    })
})

{'text': ''}

{'text': Value(dtype='string', id=None)}

train_ds = raw_datasets["train"].select(range(1000))
valid_ds = raw_datasets["validation"].select(range(1000))

n_train, n_valid = train_ds.num_rows, valid_ds.num_rows
train_idxs, valid_idxs = L(range(n_train)), L(range(n_train, n_train + n_valid))
raw_ds = concatenate_datasets([train_ds, valid_ds])

def remove_empty_text(example):
    if example["text"].strip() == "":
        example["text"] = "  "
    return example


raw_ds = raw_ds.map(remove_empty_text)

Causal language modeling

learn = BlearnerForLM.from_data(
    raw_ds, "gpt2", text_attr="text", lm_strategy_cls=CausalLMStrategy, dblock_splitter=IndexSplitter(valid_idxs), dl_kwargs={"bs": 2}
).to_fp16()

Using pad_token, but it is not set yet.

learn.dls.show_batch(dataloaders=learn.dls, max_n=2, trunc_at=250)

	text	target
0	A lookout aboard Weehawken spotted Atlanta at 04 : 10 on the morning of 17 June. When the latter ship closed to within about 1 @.@ 5 miles ( 2 @.@ 4 km ) of the two Union ships, she fired one round from her bow gun that passed over Weehawken and lan	lookout aboard Weehawken spotted Atlanta at 04 : 10 on the morning of 17 June. When the latter ship closed to within about 1 @.@ 5 miles ( 2 @.@ 4 km ) of the two Union ships, she fired one round from her bow gun that passed over Weehawken and lande
1	The music was composed by Hitoshi Sakimoto, who had also worked on the previous Valkyria Chronicles games. When he originally heard about the project, he thought it would be a light tone similar to other Valkyria Chronicles games, but found the them	music was composed by Hitoshi Sakimoto, who had also worked on the previous Valkyria Chronicles games. When he originally heard about the project, he thought it would be a light tone similar to other Valkyria Chronicles games, but found the themes m

learn.fit_one_cycle(1, lr_max=3e-4, cbs=[BlearnerForLM.get_metrics_cb()])

epoch	train_loss	valid_loss	perplexity	lm_accuracy	time
0	4.627799	4.504875	90.457054	0.268662	00:35

learn.show_results(learner=learn, max_n=2, trunc_at=500)

	text	target	prediction
0	Meridian is rightly considered an architectural treasure trove being one the nations most intact cities from the turn of the last century. Architecture students from around the nation and Canada are known to visit Meridian in groups as part of their coursework due to numerous structures in the city having been designed by noted architects. The only home in the US south designed by noted Canadian born Architect Louis S. Curtiss, famous for inventing the glass curtain wall skyscraper, is extant o	is rightly considered an architectural treasure trove being one the nations most intact cities from the turn of the last century. Architecture students from around the nation and Canada are known to visit Meridian in groups as part of their coursework due to numerous structures in the city having been designed by noted architects. The only home in the US south designed by noted Canadian born Architect Louis S. Curtiss, famous for inventing the glass curtain wall skyscraper, is extant on Highlan	\n a the.. of of and. the time of the century century.\n is are the the world are the are for be the, the of well of the academic.. to the historical and the city. been built by and architect and\n city of the world of of by renowned architects architect and.a.Siss. is for hising the first and,,raper, famous known in the Park, The onlyfort, by Mile,, located considered an of the most buildingsistico buildingsrapers in the world, is generally considered to the'ss Three Threeman. The only American ar
1	Meridian is served by the Meridian @-@ Lauderdale County Public Library, located at the corner of 7th Street and 26th Avenue. The city originally had two Carnegie libraries, both built in 1913 – one for blacks and one for whites. A group of women had formed the Fortnightly Book and Magazine Club in the 1880s and began raising money to build a library for the city. The books they collected and shared within the club were later the basis of the library collection for Meridian. With wide support f	is served by the Meridian @-@ Lauderdale County Public Library, located at the corner of 7th Street and 26th Avenue. The city originally had two Carnegie libraries, both built in 1913 – one for blacks and one for whites. A group of women had formed the Fortnightly Book and Magazine Club in the 1880s and began raising money to build a library for the city. The books they collected and shared within the club were later the basis of the library collection for Meridian. With wide support for the li	\n a by the United Hotel The -,, Library. and at: intersection ofth and andth Avenue,\n Meridian of opened a lakes Mellon, located of by the. the in the and one for whites. The new of people, been the " Lauderdale @, Club Book,, citys. 1890 publishing money for a new. blacks city. The city were were were published with the city were the used book of the... the. The the from the city, the city was the,, a-, to the library libraryic, Carnegie, the the in The city was the and located in theth Street

Learner.blurr_generate works here too

learn.blurr_generate("Blurr is fun to work with because", max_length=50, do_sample=True, top_k=25)

[{'generated_texts': ' Blurr is fun to work with because and\n and the and the and the and of the and.\n\nThere will be some of those with those who are not with us and some of those who are.\nand the'}]

Masked language modeling

learn = BlearnerForLM.from_data(
    raw_ds,
    "bert-base-uncased",
    text_attr="text",
    lm_strategy_cls=BertMLMStrategy,
    dblock_splitter=IndexSplitter(valid_idxs),
    dl_kwargs={"bs": 2},
).to_fp16()

learn.fit_one_cycle(1, lr_max=3e-4, cbs=[BlearnerForLM.get_metrics_cb()])

epoch	train_loss	valid_loss	perplexity	lm_accuracy	time
0	1.062873	0.823877	2.279321	0.676234	00:32

learn.show_results(learner=learn, max_n=2, trunc_at=500)

	text	target	prediction
0	meridian is right ##ly considered an architectural treasure tr ##ove being one the nations most intact cities from [meta] turn of the last century [MASK] architecture students from around [MASK] nation [baptiste] canada are known to [MASK] meridian in [MASK] as part of their course ##work due to numerous structures in the city having been designed by noted architects . the only home in the us south [MASK] by noted [MASK] born architect [MASK] s . curtiss , famous for [MASK] ##venting the glass curtain wall skyscraper [MASK] [MASK] extant on highland park . the frank fort designed [MASK] [MASK] building is generally considered one of the best [MASK] deco skyscraper [MASK] [MASK] the us and is [MASK] compared to [detroit] ' s famed fisher building [##top] noted california [MASK] wallace ne ##ff designed a number of homes [MASK] [MASK] as well [MASK] in [MASK] alabama black [MASK] [which] [MASK] ##jo ##ins the city across the [MASK] [MASK] state line [.] he had relatives in meridian and selma who were [MASK] in the then thriving railroad industry and would take commissions [MASK] the area when [MASK] [discipline] california were lean . [MASK] work is mostly [concentrated] in the lower numbered blocks of pop ##lar springs drive where his 251 ##6 pop ##lar [MASK] [wheeler] is [often] compared to the similarly designed falcon lair [MASK] the beverly hills [MASK] in benedict canyon of rudolph valentin ##o . [MASK] ne ##ff work was lost to an expansion of anderson hospital in [MASK] and another in marion park [MASK] in the 1950s . the meridian post office with [MASK] interior done entirely [of] bronze and verde marble is also [MASK] as a very fine example of [MASK] type of post office structures built in thriving and [MASK] to do cities [MASK] the 1920s and [MASK] had lal [MASK] lighting [MASK] was removed sadly during a 1960s re ##mo ##del ##ing and which are now in private residences on pop ##lar springs drive and in north hills [MASK]	meridian is right ##ly considered an architectural treasure tr ##ove being one the nations most intact cities from [the] turn of the last century [.] architecture students from around [the] nation [and] canada are known to [visit] meridian in [groups] as part of their course ##work due to numerous structures in the city having been designed by noted architects . the only home in the us south [designed] by noted [canadian] born architect [louis] s . curtiss , famous for [in] ##venting the glass curtain wall skyscraper [,] [is] extant on highland park . the frank fort designed [three] [##foot] building is generally considered one of the best [art] deco skyscraper [##s] [in] the us and is [often] compared to [detroit] ' s famed fisher building [.] noted california [architect] wallace ne ##ff designed a number of homes [in] [meridian] as well [as] in [the] alabama black [belt] [which] [ad] ##jo ##ins the city across the [nearby] [alabama] state line [.] he had relatives in meridian and selma who were [executives] in the then thriving railroad industry and would take commissions [in] the area when [commissions] [in] california were lean . [his] work is mostly [concentrated] in the lower numbered blocks of pop ##lar springs drive where his 251 ##6 pop ##lar [springs] [drive] is [often] compared to the similarly designed falcon lair [,] the beverly hills [home] in benedict canyon of rudolph valentin ##o . [one] ne ##ff work was lost to an expansion of anderson hospital in [1990] and another in marion park [burned] in the 1950s . the meridian post office with [its] interior done entirely [of] bronze and verde marble is also [noteworthy] as a very fine example of [the] type of post office structures built in thriving and [well] to do cities [in] the 1920s and [originally] had lal [##ique] lighting [which] was removed sadly during a 1960s re ##mo ##del ##ing and which are now in private residences on pop ##lar springs drive and in north hills [.]	meridian is right ##ly considered an architectural treasure tr ##ove being one the nations most intact cities from [the] turn of the last century [.] architecture students from around [the] nation [and] canada are known to [visit] meridian in [particular] as part of their course ##work due to numerous structures in the city having been designed by noted architects . the only home in the us south [is] by noted [american] born architect [william] s . curtiss , famous for [in] ##venting the glass curtain wall skyscraper [that] [still] extant on highland park . the frank fort designed [the] [tower] building is generally considered one of the best [art] deco skyscraper [##s] [in] the us and is [often] compared to [detroit] ' s famed fisher building [.] noted california [architect] wallace ne ##ff designed a number of homes [in] [meridian] as well [as] in [the] alabama black [hills] [which] [ad] ##jo ##ins the city across the [alabama] [tennessee] state line [.] he had relatives in meridian and selma who were [involved] in the then thriving railroad industry and would take commissions [in] the area when [the] [in] california were lean . [his] work is mostly [concentrated] in the lower numbered blocks of pop ##lar springs drive where his 251 ##6 pop ##lar [springs] [wheeler] is [often] compared to the similarly designed falcon lair [and] the beverly hills [and] in benedict canyon of rudolph valentin ##o . [the] ne ##ff work was lost to an expansion of anderson hospital in [1950] and another in marion park [beginning] in the 1950s . the meridian post office with [its] interior done entirely [of] bronze and verde marble is also [regarded] as a very fine example of [the] type of post office structures built in thriving and [well] to do cities [in] the 1920s and [1930s] had lal [##l] lighting [which] was removed sadly during a 1960s re ##mo ##del ##ing and which are now in private residences on pop ##lar springs drive and in north hills [.]
1	[MASK] cap is [light] [MASK] @ - @ brown , with a diameter typically ranging from 1 [MASK] 4 @ [##我] @ 5 cm ( 0 @ . @ 4 [MASK] 1 @ . @ 8 in ) . initially con ##ic to bell @ - @ shaped to convex , it flat ##tens during [MASK] [MASK] [MASK] [visible] surface grooves corresponding to the gills underneath the cap . the margin of the cap has minute but distinct [MASK] [MASK] ##ops . the surface is moist and smooth , and h ##y ##gr ##op ##han ##ous [MASK] the cap frequently develops splits [MASK] [fraternity] margin , or cracks in the [MASK] [MASK] the central part of the cap ) . the flesh of the cap is thick [MASK] [MASK] [repertory] but [MASK] elsewhere , gray ##ish to whitish , fragile , and with a slightly meal ##y odor [MASK] taste . the gills have a dec ##urrent attachment to the stem ( [##pta] [MASK] , running down the length of [MASK] [MASK] [MASK] and are a pale brownish color with ting ##es of red . they are broad ( between 3 and 6 mm ) , and have a close to sub ##dis ##tan [MASK] spa ##cing , with about 26 – 35 gills reaching the stem . the fragile stem is 3 to [MASK] [MASK] [MASK] [MASK] @ . @ 2 to 3 @ . @ 5 in ) long by [MASK] [MASK] . @ [MASK] to 0 @ . @ 4 cm [MASK] 0 @ [MASK] @ 06 to 0 @ . [cm] 16 in ) thick and yellow [MASK] yellow @ [-] @ brown , becoming reddish @ - @ brown to orange @ - @ brown [MASK] [MASK] bottom half [in] maturity [MASK] [MASK] lower portion [MASK] young stems is covered with white fl ##eck ##s . roughly equal in [MASK] at [MASK] [MASK] and bottom , the base of the [MASK] is covered by [MASK] yellowish my ##cel ##ium [that] can be [up] to a third of the length of the stem . the ed ##ibility of the mushroom [MASK] " doubtful [MASK] and consumption " best avoided " .	[the] cap is [light] [reddish] @ - @ brown , with a diameter typically ranging from 1 [to] 4 @ [.] @ 5 cm ( 0 @ . @ 4 [to] 1 @ . @ 8 in ) . initially con ##ic to bell @ - @ shaped to convex , it flat ##tens during [maturity] [,] [developing] [visible] surface grooves corresponding to the gills underneath the cap . the margin of the cap has minute but distinct [sc] [##all] ##ops . the surface is moist and smooth , and h ##y ##gr ##op ##han ##ous [.] the cap frequently develops splits [in] [the] margin , or cracks in the [disc] [(] the central part of the cap ) . the flesh of the cap is thick [in] [the] [center] but [thin] elsewhere , gray ##ish to whitish , fragile , and with a slightly meal ##y odor [and] taste . the gills have a dec ##urrent attachment to the stem ( [that] [is] , running down the length of [the] [stem] [)] and are a pale brownish color with ting ##es of red . they are broad ( between 3 and 6 mm ) , and have a close to sub ##dis ##tan [##t] spa ##cing , with about 26 – 35 gills reaching the stem . the fragile stem is 3 to [9] [cm] [(] [1] @ . @ 2 to 3 @ . @ 5 in ) long by [0] [@] . @ [15] to 0 @ . @ 4 cm [(] 0 @ [.] @ 06 to 0 @ . [@] 16 in ) thick and yellow [to] yellow @ [-] @ brown , becoming reddish @ - @ brown to orange @ - @ brown [in] [the] bottom half [in] maturity [.] [the] lower portion [of] young stems is covered with white fl ##eck ##s . roughly equal in [thickness] at [the] [top] and bottom , the base of the [stem] is covered by [a] yellowish my ##cel ##ium [that] can be [up] to a third of the length of the stem . the ed ##ibility of the mushroom [is] " doubtful ["] and consumption " best avoided " .	[the] cap is [light] [brown] @ - @ brown , with a diameter typically ranging from 1 [.] 4 @ [.] @ 5 cm ( 0 @ . @ 4 [@] 1 @ . @ 8 in ) . initially con ##ic to bell @ - @ shaped to convex , it flat ##tens during [growth] [,] [with] [visible] surface grooves corresponding to the gills underneath the cap . the margin of the cap has minute but distinct [white] [wall] ##ops . the surface is moist and smooth , and h ##y ##gr ##op ##han ##ous [(] the cap frequently develops splits [,] [the] margin , or cracks in the [cap] [of] the central part of the cap ) . the flesh of the cap is thick [in] [in] [,] but [,] elsewhere , gray ##ish to whitish , fragile , and with a slightly meal ##y odor [and] taste . the gills have a dec ##urrent attachment to the stem ( [gills] [)] , running down the length of [the] [cap] [,] and are a pale brownish color with ting ##es of red . they are broad ( between 3 and 6 mm ) , and have a close to sub ##dis ##tan [##t] spa ##cing , with about 26 – 35 gills reaching the stem . the fragile stem is 3 to [4] [4] [.] [@] @ . @ 2 to 3 @ . @ 5 in ) long by [4] [@] . @ [1] to 0 @ . @ 4 cm [(] 0 @ [.] @ 06 to 0 @ . [@] 16 in ) thick and yellow [to] yellow @ [-] @ brown , becoming reddish @ - @ brown to orange @ - @ brown [in] [in] bottom half [in] maturity [.] [the] lower portion [of] young stems is covered with white fl ##eck ##s . roughly equal in [size] at [top] [top] and bottom , the base of the [stem] is covered by [a] yellowish my ##cel ##ium [that] can be [up] to a third of the length of the stem . the ed ##ibility of the mushroom [is] " doubtful ["] and consumption " best avoided " .

tfm = first_blurr_tfm(learn.dls)

Learner.blurr_fill_mask works here too

learn.blurr_fill_mask(f"Blurr is a {tfm.hf_tokenizer.mask_token}.", n_preds=5)

['Blurr is a word.',
 'Blurr is a game.',
 'Blurr is a term.',
 'Blurr is a concept.',
 'Blurr is a name.']

Summarization

raw_datasets = load_dataset("cnn_dailymail", "3.0.0")
print(f"{raw_datasets}\n")
print(f'{raw_datasets["train"][0]}\n')
print(f'{raw_datasets["train"].features}\n')

Reusing dataset cnn_dailymail (/home/wgilliam/.cache/huggingface/datasets/cnn_dailymail/3.0.0/3.0.0/3cb851bf7cf5826e45d49db2863f627cba583cbc32342df7349dfe6c38060234)

DatasetDict({
    train: Dataset({
        features: ['article', 'highlights', 'id'],
        num_rows: 287113
    })
    validation: Dataset({
        features: ['article', 'highlights', 'id'],
        num_rows: 13368
    })
    test: Dataset({
        features: ['article', 'highlights', 'id'],
        num_rows: 11490
    })
})

{'article': 'It\'s official: U.S. President Barack Obama wants lawmakers to weigh in on whether to use military force in Syria. Obama sent a letter to the heads of the House and Senate on Saturday night, hours after announcing that he believes military action against Syrian targets is the right step to take over the alleged use of chemical weapons. The proposed legislation from Obama asks Congress to approve the use of military force "to deter, disrupt, prevent and degrade the potential for future uses of chemical weapons or other weapons of mass destruction." It\'s a step that is set to turn an international crisis into a fierce domestic political battle. There are key questions looming over the debate: What did U.N. weapons inspectors find in Syria? What happens if Congress votes no? And how will the Syrian government react? In a televised address from the White House Rose Garden earlier Saturday, the president said he would take his case to Congress, not because he has to -- but because he wants to. "While I believe I have the authority to carry out this military action without specific congressional authorization, I know that the country will be stronger if we take this course, and our actions will be even more effective," he said. "We should have this debate, because the issues are too big for business as usual." Obama said top congressional leaders had agreed to schedule a debate when the body returns to Washington on September 9. The Senate Foreign Relations Committee will hold a hearing over the matter on Tuesday, Sen. Robert Menendez said. Transcript: Read Obama\'s full remarks . Syrian crisis: Latest developments . U.N. inspectors leave Syria . Obama\'s remarks came shortly after U.N. inspectors left Syria, carrying evidence that will determine whether chemical weapons were used in an attack early last week in a Damascus suburb. "The aim of the game here, the mandate, is very clear -- and that is to ascertain whether chemical weapons were used -- and not by whom," U.N. spokesman Martin Nesirky told reporters on Saturday. But who used the weapons in the reported toxic gas attack in a Damascus suburb on August 21 has been a key point of global debate over the Syrian crisis. Top U.S. officials have said there\'s no doubt that the Syrian government was behind it, while Syrian officials have denied responsibility and blamed jihadists fighting with the rebels. British and U.S. intelligence reports say the attack involved chemical weapons, but U.N. officials have stressed the importance of waiting for an official report from inspectors. The inspectors will share their findings with U.N. Secretary-General Ban Ki-moon Ban, who has said he wants to wait until the U.N. team\'s final report is completed before presenting it to the U.N. Security Council. The Organization for the Prohibition of Chemical Weapons, which nine of the inspectors belong to, said Saturday that it could take up to three weeks to analyze the evidence they collected. "It needs time to be able to analyze the information and the samples," Nesirky said. He noted that Ban has repeatedly said there is no alternative to a political solution to the crisis in Syria, and that "a military solution is not an option." Bergen:  Syria is a problem from hell for the U.S. Obama: \'This menace must be confronted\' Obama\'s senior advisers have debated the next steps to take, and the president\'s comments Saturday came amid mounting political pressure over the situation in Syria. Some U.S. lawmakers have called for immediate action while others warn of stepping into what could become a quagmire. Some global leaders have expressed support, but the British Parliament\'s vote against military action earlier this week was a blow to Obama\'s hopes of getting strong backing from key NATO allies. On Saturday, Obama proposed what he said would be a limited military action against Syrian President Bashar al-Assad. Any military attack would not be open-ended or include U.S. ground forces, he said. Syria\'s alleged use of chemical weapons earlier this month "is an assault on human dignity," the president said. A failure to respond with force, Obama argued,  "could lead to escalating use of chemical weapons or their proliferation to terrorist groups who would do our people harm. In a world with many dangers, this menace must be confronted." Syria missile strike: What would happen next? Map: U.S. and allied assets around Syria . Obama decision came Friday night . On Friday night, the president made a last-minute decision to consult lawmakers. What will happen if they vote no? It\'s unclear. A senior administration official told CNN that Obama has the authority to act without Congress -- even if Congress rejects his request for authorization to use force. Obama on Saturday continued to shore up support for a strike on the al-Assad government. He spoke by phone with French President Francois Hollande before his Rose Garden speech. "The two leaders agreed that the international community must deliver a resolute message to the Assad regime -- and others who would consider using chemical weapons -- that these crimes are unacceptable and those who violate this international norm will be held accountable by the world," the White House said. Meanwhile, as uncertainty loomed over how Congress would weigh in, U.S. military officials said they remained at the ready. 5 key assertions: U.S. intelligence report on Syria . Syria: Who wants what after chemical weapons horror . Reactions mixed to Obama\'s speech . A spokesman for the Syrian National Coalition said that the opposition group was disappointed by Obama\'s announcement. "Our fear now is that the lack of action could embolden the regime and they repeat his attacks in a more serious way," said spokesman Louay Safi. "So we are quite concerned." Some members of Congress applauded Obama\'s decision. House Speaker John Boehner, Majority Leader Eric Cantor, Majority Whip Kevin McCarthy and Conference Chair Cathy McMorris Rodgers issued a statement Saturday praising the president. "Under the Constitution, the responsibility to declare war lies with Congress," the Republican lawmakers said. "We are glad the president is seeking authorization for any military action in Syria in response to serious, substantive questions being raised." More than 160 legislators, including 63 of Obama\'s fellow Democrats, had signed letters calling for either a vote or at least a "full debate" before any U.S. action. British Prime Minister David Cameron, whose own attempt to get lawmakers in his country to support military action in Syria failed earlier this week, responded to Obama\'s speech in a Twitter post Saturday. "I understand and support Barack Obama\'s position on Syria," Cameron said. An influential lawmaker in Russia -- which has stood by Syria and criticized the United States -- had his own theory. "The main reason Obama is turning to the Congress:  the military operation did not get enough support either in the world, among allies of the US or in the United States itself," Alexei Pushkov, chairman of the international-affairs committee of the Russian State Duma, said in a Twitter post. In the United States, scattered groups of anti-war protesters around the country took to the streets Saturday. "Like many other Americans...we\'re just tired of the United States getting involved and invading and bombing other countries," said Robin Rosecrans, who was among hundreds at a Los Angeles demonstration. What do Syria\'s neighbors think? Why Russia, China, Iran stand by Assad . Syria\'s government unfazed . After Obama\'s speech, a military and political analyst on Syrian state TV said Obama is "embarrassed" that Russia opposes military action against Syria, is "crying for help" for someone to come to his rescue and is facing two defeats -- on the political and military levels. Syria\'s prime minister appeared unfazed by the saber-rattling. "The Syrian Army\'s status is on maximum readiness and fingers are on the trigger to confront all challenges," Wael Nader al-Halqi said during a meeting with a delegation of Syrian expatriates from Italy, according to a banner on Syria State TV that was broadcast prior to Obama\'s address. An anchor on Syrian state television said Obama "appeared to be preparing for an aggression on Syria based on repeated lies." A top Syrian diplomat told the state television network that Obama was facing pressure to take military action from Israel, Turkey, some Arabs and right-wing extremists in the United States. "I think he has done well by doing what Cameron did in terms of taking the issue to Parliament," said Bashar Jaafari, Syria\'s ambassador to the United Nations. Both Obama and Cameron, he said, "climbed to the top of the tree and don\'t know how to get down." The Syrian government has denied that it used chemical weapons in the August 21 attack, saying that jihadists fighting with the rebels used them in an effort to turn global sentiments against it. British intelligence had put the number of people killed in the attack at more than 350. On Saturday, Obama said "all told, well over 1,000 people were murdered." U.S. Secretary of State John Kerry on Friday cited a death toll of 1,429, more than 400 of them children. No explanation was offered for the discrepancy. Iran: U.S. military action in Syria would spark \'disaster\' Opinion: Why strikes in Syria are a bad idea .', 'highlights': 'Syrian official: Obama climbed to the top of the tree, "doesn\'t know how to get down"\nObama sends a letter to the heads of the House and Senate .\nObama to seek congressional approval on military action against Syria .\nAim is to determine whether CW were used, not by whom, says U.N. spokesman .', 'id': '0001d1afc246a7964130f43ae940af6bc6c57f01'}

{'article': Value(dtype='string', id=None), 'highlights': Value(dtype='string', id=None), 'id': Value(dtype='string', id=None)}

train_ds = raw_datasets["train"].select(range(1000))
valid_ds = raw_datasets["validation"].select(range(500))

n_train, n_valid = train_ds.num_rows, valid_ds.num_rows
train_idxs, valid_idxs = L(range(n_train)), L(range(n_train, n_train + n_valid))
raw_ds = concatenate_datasets([train_ds, valid_ds])

learn = BlearnerForSummarization.from_data(
    raw_ds,
    "facebook/bart-large-cnn",
    text_attr="article",
    summary_attr="highlights",
    max_length=256,
    max_target_length=130,
    dblock_splitter=IndexSplitter(valid_idxs),
    dl_kwargs={"bs": 2},
).to_fp16()

learn.dls.show_batch(dataloaders=learn.dls, max_n=2, input_trunc_at=500, target_trunc_at=250)

	text	target
0	<s> (CNN) -- When Ji Yeqing awakened, she was already in the recovery room. Chinese authorities had dragged her out of her home and down four flights of stairs, she said, restraining and beating her husband as he tried to come to her aid. They whisked her into a clinic, held her down on a bed and forced her to undergo an abortion. Her offense? Becoming pregnant with a second child, in violation of China's one-child policy. "After the abortion, I felt empty, as if something was scooped out of me,	China's one-child policy results in forced abortions and sterilizations, activists say.\nWomen tell of emotional and physical consequences from the procedures.\nActivist Chen Guangcheng works to advocate for victims of such practices.
1	<s> (CNN Student News) -- January 13, 2011. Download PDF maps related to today's show:. • Arizona • Australia. Transcript. THIS IS A RUSH TRANSCRIPT. THIS COPY MAY NOT BE IN ITS FINAL FORM AND MAY BE UPDATED. CARL AZUZ, CNN STUDENT NEWS ANCHOR: A problem that won't be solved, even if the solution is clear. The story and the reasons, leading off today's broadcast of CNN Student News! My name is Carl Azuz! First Up: Winter Storm Woes. AZUZ: Florida is the only state in the union without snow on th	A winter storm slams the northeastern United States.\nThe U.S. House of Representatives condemns the Arizona shooting.\nMassive floods leave vast areas of Australia underwater.\nUse the Daily Discussion to help students understand today's featured news

metrics_cb = BlearnerForSummarization.get_metrics_cb()
learn.fit_one_cycle(1, lr_max=4e-5, cbs=[metrics_cb])

epoch	train_loss	valid_loss	rouge1	rouge2	rougeL	rougeLsum	bertscore_precision	bertscore_recall	bertscore_f1	time
0	1.805652	1.987627	0.348567	0.148174	0.243829	0.319418	0.871049	0.893810	0.882177	10:06

learn.show_results(learner=learn, max_n=2, input_trunc_at=500, target_trunc_at=250)

	text	target	prediction
0	(CNN)Reading the headlines out of Madison, Wisconsin, it's hard not to think about Ferguson, Missouri. But law enforcement's response to the shooting of 19-year-old Tony Robinson will not unfold in the same chaotic, violent and distrusting way as the shooting of 18-year-old Michael Brown, Madison's top police leaders vowed. "I think it's very clear that Madison, Wisconsin, is not Ferguson, Missouri," said Jim Palmer, the executive director of the Wisconsin Professional Police Association. The h	Police officials in Madison say their responses to shooting by officer reflect their role in community.\nOne example: Madison chief talked to teen's family soon after shooting.\nA month went by before Ferguson chief apologized to Brown's family.	[ Law enforcement in Madison, Wisconsin, says it has a strong relationship with the people it serves .\nPolice chief says he understands people are angry and want answers .\nChief says he went to the shooting victim's mother's home within hours of the shooting .\nThe chief says the department is working to "bring community back into the fold" of the community ., ISIS claimed responsibility for Yemen's deadliest terror attack on Friday .\nThe group's momentum may have stalled in Syria and Iraq, but its supporters appear to be heeding its call to "erupt volcanoes of jihad"\nISIS was only thought to have a fledgling presence in Yemen and had only claimed one previous attack .]

Learner.blurr_generate works here too

test_article = """
About 10 men armed with pistols and small machine guns raided a casino in Switzerland and made off 
into France with several hundred thousand Swiss francs in the early hours of Sunday morning, police said. 
The men, dressed in black clothes and black ski masks, split into two groups during the raid on the Grand Casino 
Basel, Chief Inspector Peter Gill told CNN. One group tried to break into the casino's vault on the lower level 
but could not get in, but they did rob the cashier of the money that was not secured, he said. The second group 
of armed robbers entered the upper level where the roulette and blackjack tables are located and robbed the 
cashier there, he said. As the thieves were leaving the casino, a woman driving by and unaware of what was 
occurring unknowingly blocked the armed robbers' vehicles. A gunman pulled the woman from her vehicle, beat 
her, and took off for the French border. The other gunmen followed into France, which is only about 100 
meters (yards) from the casino, Gill said. There were about 600 people in the casino at the time of the robbery. 
There were no serious injuries, although one guest on the Casino floor was kicked in the head by one of the 
robbers when he moved, the police officer said. Swiss authorities are working closely with French authorities, 
Gill said. The robbers spoke French and drove vehicles with French lRicense plates. CNN's Andreena Narayan 
contributed to this report.
"""

outputs = learn.blurr_generate(test_article, num_return_sequences=3)

for idx, o in enumerate(outputs):
    print(f"=== Prediction {idx+1} ===\n{o}\n")

=== Prediction 1 ===
{'generated_texts': [" Robbers made off with several hundred thousand Swiss francs in the early hours of Sunday morning, police say .\nThe men, dressed in black clothes and black ski masks, split into two groups during the raid on the Grand Casino Basel .\nOne group tried to break into the casino's vault on the lower level, but could not get in .\nA woman driving by unknowingly blocked the robbers' vehicles and a gunman beat her to death .\nThere were no serious injuries, although one guest on the Casino floor was kicked in the head .", " Robbers made off with several hundred thousand Swiss francs in the early hours of Sunday morning, police say .\nThe men, dressed in black clothes and black ski masks, split into two groups during the raid on the Grand Casino Basel .\nOne group tried to break into the casino's vault on the lower level, but could not get in .\nA woman driving by unknowingly blocked the robbers' vehicles and a gunman beat her to death .\nThere were about 600 people in the casino at the time of the robbery .", " Robbers made off with several hundred thousand Swiss francs in the early hours of Sunday morning, police say .\nThe men, dressed in black clothes and black ski masks, split into two groups during the raid on the Grand Casino Basel .\nOne group tried to break into the casino's vault on the lower level, but could not get in .\nA woman driving by unknowingly blocked the robbers' vehicles and a gunman beat her to death ."]}

Translation

raw_datasets = load_dataset("wmt16", "de-en")
print(f"{raw_datasets}\n")
print(f'{raw_datasets["train"][0]}\n')
print(f'{raw_datasets["train"].features}\n')

Reusing dataset wmt16 (/home/wgilliam/.cache/huggingface/datasets/wmt16/de-en/1.0.0/af3c5d746b307726d0de73ebe7f10545361b9cb6f75c83a1734c000e48b6264f)

DatasetDict({
    train: Dataset({
        features: ['translation'],
        num_rows: 4548885
    })
    validation: Dataset({
        features: ['translation'],
        num_rows: 2169
    })
    test: Dataset({
        features: ['translation'],
        num_rows: 2999
    })
})

{'translation': {'de': 'Wiederaufnahme der Sitzungsperiode', 'en': 'Resumption of the session'}}

{'translation': Translation(languages=['de', 'en'], id=None)}

train_ds = raw_datasets["train"].select(range(1000))
valid_ds = raw_datasets["validation"].select(range(500))

n_train, n_valid = train_ds.num_rows, valid_ds.num_rows
train_idxs, valid_idxs = L(range(n_train)), L(range(n_train, n_train + n_valid))
raw_ds = concatenate_datasets([train_ds, valid_ds])

def make_dict(item):
    return item["translation"]


raw_ds = raw_ds.map(make_dict)

Loading cached processed dataset at /home/wgilliam/.cache/huggingface/datasets/wmt16/de-en/1.0.0/af3c5d746b307726d0de73ebe7f10545361b9cb6f75c83a1734c000e48b6264f/cache-33c5db1dfc9c3c37.arrow

train_df = pd.DataFrame(raw_ds["translation"], columns=["de", "en"])

learn = BlearnerForTranslation.from_data(
    train_df,
    "Helsinki-NLP/opus-mt-de-en",
    src_lang_name="German",
    src_lang_attr="de",
    trg_lang_name="English",
    trg_lang_attr="en",
    dblock_splitter=RandomSplitter(),
    dl_kwargs={"bs": 2},
).to_fp16()

learn.dls.show_batch(dataloaders=learn.dls, max_n=2, input_trunc_at=500, target_trunc_at=250)

	text	target
0	▁Angesichts▁dieser Situation▁muß▁aus dem▁Bericht, den das▁Parlament annimmt,▁klar▁hervorgehen,▁daß▁Maßnahmen▁notwendig▁sind, die▁eindeutig auf die▁Bekämpfung der relativen▁Armut und der Arbeitslosigkeit▁gerichtet▁sind.▁Maßnahmen▁wie die für diese▁Zwecke▁angemessene▁Verwendung der▁Strukturfonds, die▁häufig▁unsachgemäß▁eingesetzt▁werden, und▁zwar mit▁zentralen▁staatlichen▁Politiken, die▁Modernisierung der▁Bereiche Telekommunikation und▁Kommunikation,▁indem man vor▁allem die am▁wenigsten▁entwickelt	Given this situation, the report approved by Parliament must highlight the need for measures that aim unequivocally to fight relative poverty and unemployment: measures such as the appropriate use of structural funds for these purposes, which are oft
1	In▁unseren Änderungsanträgen▁haben wir▁festgeschrieben,▁welche▁Bedeutung wir der Herausbildung der▁notwendigen▁Synergien▁zwischen den▁Strukturfonds, dem▁Kohäsionsfonds und den▁Gemeinschaftsinitiativen▁beimessen, so▁daß▁ihre▁Anwendung auf▁optimale und rentabelste▁Weise im▁zunehmenden▁Abbau der▁regionalen▁Ungleichheiten und in der▁Schaffung von▁Arbeitsplätzen▁ihren▁Niederschlag▁findet, die▁letztendlich die▁beiden▁Hauptziele der hier zur▁Debatte▁stehenden▁Fonds▁sind.<pad><pad><pad><pad><pad><pad><p	In our amendments, we have stated the importance of the necessary synergies being produced between the Structural Funds, the Cohesion Fund and Community initiatives, so that their application should be reflected, in the best and most profitable way,

metrics_cb = BlearnerForTranslation.get_metrics_cb()
learn.fit_one_cycle(1, lr_max=4e-5, cbs=[metrics_cb])

[nltk_data] Downloading package wordnet to /home/wgilliam/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
[nltk_data] Downloading package punkt to /home/wgilliam/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package omw-1.4 to /home/wgilliam/nltk_data...
[nltk_data]   Package omw-1.4 is already up-to-date!

epoch	train_loss	valid_loss	bleu	meteor	sacrebleu	time
0	1.360521	1.345847	0.334178	0.613131	31.482812	01:14

learn.show_results(learner=learn, max_n=2, input_trunc_at=500, target_trunc_at=250)

	text	target	prediction
0	▁Angesichts▁dessen▁müssen wir in▁diesem▁Parlament auf▁jeden Fall▁verlangen,▁daß die▁gemeinschaftlichen▁Förderkonzepte für den▁fraglichen▁Zeitraum in▁diesem▁Parlament vor▁ihrer▁Annahme▁geprüft und▁erörtert▁werden, und▁zwar▁anhand der▁Leitlinien, die wir▁heute▁vorlegen,▁denn wir▁halten▁sie für▁ganz▁besonders▁geeignet,▁Arbeitsplätze in den▁ärmsten oder am▁wenigsten▁entwickelten▁Regionen zu▁schaffen, und so▁tragen wir▁dazu▁bei, den▁negativen, zur▁Ungleichheit▁führenden▁Tendenzen in der▁europäischen	Bearing this in mind, this House should, in any event, demand that, before the Community support frameworks for the period in question are approved, they be studied and submitted for debate in this Parliament, specifically in light of the guidelines	[In view of this, we in this Parliament must in any case demand that the Community support frameworks for the period in question be examined and discussed in this Parliament before they are adopted, on the basis of the guidelines that we are presenting today, because we consider them to be particularly suitable for creating jobs in the poorest or least developed regions, and thus we are helping to counter the negative trends leading to inequality in European society, so that we can achieve a fairer Europe., The guidelines, moreover, are based on two horizontal principles: rural development - and the issue of a sustainable transport structure, Madam rapporteur, which has been at my heart for a long time, especially since my time as my country' s Minister for the Environment - and the second principle is equal opportunities, especially between women and men, as well as the European employment strategy and economic and monetary union.]

Learner.blurr_generate works here too

test_de = "Ich trinke gerne Bier"

learn.blurr_generate(test_de)

[{'generated_texts': 'I like to drink beer'}]

Summary

In summary, whether you want to work with Blurr’s low, mid, or high-level API … we got you covered :)