Skip to content

Implemented lazy line-by-line text data set loading for LM example script #4009

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

GCHQResearcher92457
Copy link

See PR #3388. Master changed substantially, requiring relocation of code into previously untouched files etc. Instead, here is a new PR using the same code but refactored to fit in to the new more modular structure of the scripts in examples.

@Joppewouts
Copy link

Used this for training a model, worked great! Would love to see this integrated

Copy link
Member

@julien-c julien-c left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My main interrogation is about the force_pad_token option/feature, what do you think @LysandreJik @patrickvonplaten @BramVanroy?

metadata={
"help": "Whether to force the addition of a padding token to tokenizer that does not already have one."
},
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a fan of this option personally (also see #4122 (comment))

I'd rather the example scripts do not modify the specified tokenizer – I feel like advanced users should modify their tokenizer off-script.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do you think @patrickvonplaten?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, same from my side. I think one should use Trainer with DataCollatorForLanguageModeling and use a fitting tokenizer.

GPT2 is a heavily used model though and it would be nice to allow to use it via this script. Another possibility I could see here is to add a pad_token_id to the args that would set PAD_TOKEN to the provided id. So for GPT2, one could do --pad_token_id 50256. On the other hand pad_token_id seems to be quite a specific param to add to the args, so it might be cleaner to just not allow this case and force the user to use Trainer + own Datacollator

# See PR 3388. Some tokenizers don't had pad tokens which causes errors at the encoding step in the collate_fn.
# We give here the option to force the addition of a pad token. The attention mask is used to ignore this token
# when feeding to the model.
tokenizer.add_special_tokens({"pad_token": "<pad>"})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this will work since it will give the pad_token the id: len(tokenzier) + 1 that does not exist in the model embedding weights. What one could do for GPT2 is to set the pad_token_id = eos_token => tokenizer.pad_token = tokenizer.eos_token. Since GPT2 uses causal masking, this should be fine.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@patrickvonplaten I think it should work since later on the embeddings are resized:

model.resize_token_embeddings(len(tokenizer))

Copy link
Contributor

@patrickvonplaten patrickvonplaten Jun 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yeah you're right - I didn't even realize that there is resize embedding in the script as well.

I don't think though one should add a new embedding weight for GPT2 though, but just reuse some other token for padding (there are all masked anyways,...) so that not the whole model has to be retrained. model.resize_token_embeddings only adds new tokens if len(tokenizer) is > then model.old_embedding_weights. A lot of people just wanting to fine-tune GPT2 can run into bad performance here without knowing why.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking a bit more about it, I agree with @julien-c and think people should just adapt this script for their own (GPT2) need. It's really not that long, and providing hacky functionality here is not worth it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And the script does work for GPT2 by default, i.e. if you don't opt in to --line_by_line

@julien-c
Copy link
Member

@GCHQResearcher92457 @BramVanroy Does it work for you if we tweak the PR on your fork's branch so that we can remove the force_pad_token option and update a few things?

PS: Sorry about the super long review time:)

@GCHQResearcher92457
Copy link
Author

@GCHQResearcher92457 @BramVanroy Does it work for you if we tweak the PR on your fork's branch so that we can remove the force_pad_token option and update a few things?

PS: Sorry about the super long review time:)

Sure. I think the GPT thing was a bit of rabbit hole. I added the hacks with pad tokens because I thought I'd introduced a problem with lazy loading, without realising that the problem was in fact already there with line-by-line.

@BramVanroy
Copy link
Collaborator

@GCHQResearcher92457 @BramVanroy Does it work for you if we tweak the PR on your fork's branch so that we can remove the force_pad_token option and update a few things?

PS: Sorry about the super long review time:)

Yes, definitely seems lik a good way to go!


block_size: int = 512

def collate_batch(self, examples: List[torch.Tensor]) -> Dict[str, torch.Tensor]:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello 👋

Thanks for the PR! I tried the DataCollatorForLazyLanguageModeling and LazyLineByLineTextDataset with transformers==3.0.2, and somehow I had to rename collate_batch to __call__ to make it work.

Not sure if I'm missing something - dropping a note here in case someone runs into the same issue.

Thanks again!

@AliOsm
Copy link
Contributor

AliOsm commented Jul 30, 2020

Hello everyone, I think this PR will be a huge addition to Transformers.
Is there any plans to finish it soon?
Thanks!

@BramVanroy
Copy link
Collaborator

Hello everyone, I think this PR will be a huge addition to Transformers.
Is there any plans to finish it soon?
Thanks!

This is in the hands of @julien-c now, but I think he's on holiday at the moment.

@julien-c
Copy link
Member

Isn't this superseded by huggingface/nlp now? I'll let others chime in.

@BramVanroy
Copy link
Collaborator

Isn't this superseded by huggingface/nlp now? I'll let others chime in.

Are all examples now fully using nlp? If so, then yes and this can be closed. But if the examples are still using the trainer/dataset of transformers, then this seems a separate issue.

@julien-c julien-c requested a review from sgugger August 3, 2020 13:03
@sgugger
Copy link
Collaborator

sgugger commented Aug 3, 2020

I have no objection to merge this temporarily, if remarks from the comments are taken into accounts, merge conflicts handled and deprecated API (the data collator should implement __call__ and tokenizer.batch_encode_plus should not be used, just the tokenizer __call__) replaced. That may be a lot of work for something that will eventually be handled by nlp though.

Moving the examples to nlp is on my TODO for the near-future @BramVanroy, and I think @thomwolf is also planning on working on this.

@EdwardRaff
Copy link

When I try to run this code following the example here I get the below error:

Traceback (most recent call last):
  File "bla.py", line 209, in <module>
    trainer.train()
  File "/home/edraff/anaconda3/lib/python3.7/site-packages/transformers/trainer.py", line 492, in train
    for step, inputs in enumerate(epoch_iterator):
  File "/home/edraff/anaconda3/lib/python3.7/site-packages/tqdm/std.py", line 1107, in __iter__
    for obj in iterable:
  File "/home/edraff/anaconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 345, in __next__
    data = self._next_data()
  File "/home/edraff/anaconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 385, in _next_data
    data = self._dataset_fetcher.fetch(index)  # may raise StopIteration
  File "/home/edraff/anaconda3/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 47, in fetch
    return self.collate_fn(data)
  File "/home/edraff/anaconda3/lib/python3.7/site-packages/transformers/data/data_collator.py", line 83, in __call__
    inputs, labels = self.mask_tokens(batch)
  File "/home/edraff/anaconda3/lib/python3.7/site-packages/transformers/data/data_collator.py", line 113, in mask_tokens
    labels = inputs.clone()
AttributeError: 'tuple' object has no attribute 'clone'
Epoch:   0%|                                                                                                                                                                             | 0/1 [00:23<?, ?it/s]Iteration:   0%|                                                                                                                                                                    | 0/976243 [00:23<?, ?it/s]

@BramVanroy
Copy link
Collaborator

When I try to run this code following the example here I get the below error:

Traceback (most recent call last):
  File "bla.py", line 209, in <module>
    trainer.train()
  File "/home/edraff/anaconda3/lib/python3.7/site-packages/transformers/trainer.py", line 492, in train
    for step, inputs in enumerate(epoch_iterator):
  File "/home/edraff/anaconda3/lib/python3.7/site-packages/tqdm/std.py", line 1107, in __iter__
    for obj in iterable:
  File "/home/edraff/anaconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 345, in __next__
    data = self._next_data()
  File "/home/edraff/anaconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 385, in _next_data
    data = self._dataset_fetcher.fetch(index)  # may raise StopIteration
  File "/home/edraff/anaconda3/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 47, in fetch
    return self.collate_fn(data)
  File "/home/edraff/anaconda3/lib/python3.7/site-packages/transformers/data/data_collator.py", line 83, in __call__
    inputs, labels = self.mask_tokens(batch)
  File "/home/edraff/anaconda3/lib/python3.7/site-packages/transformers/data/data_collator.py", line 113, in mask_tokens
    labels = inputs.clone()
AttributeError: 'tuple' object has no attribute 'clone'
Epoch:   0%|                                                                                                                                                                             | 0/1 [00:23<?, ?it/s]Iteration:   0%|                                                                                                                                                                    | 0/976243 [00:23<?, ?it/s]

Not sure but I think this PR hasn't been updated to reflect recent changes.

@chiyuzhang94
Copy link

chiyuzhang94 commented Aug 21, 2020

Hi @GCHQResearcher92457 ,

Thanks for your great work.
I am trying to use this lazy loading pre-training script to train a RoBERTa from scratch.
I tested it many times. It works well when the training data less than 100 million lines.

But the script is always killed at linecache.getline(...), if my training set is more than 100M lines (e.g., 1 billion).
Error is:

died with <Signals.SIGKILL: 9>.

I checked my CPU and GPU usage, they are not full. I also changed size of _get_n_lines(...) function and the batch size. But it still doesn't work. I don't believe this is out of memory issues.

I cloned your transformers repo and use the branch lazy-text-dataset-loading-for-lm to install transformers library.

Could you please give me any idea about this problem?

Thanks,
Chiyu

More info:
Python: 3.6.8
Torch Version: 1.4.0
tensorflow Version: 2.3.0

I am also using distributed training to run the model.

@BramVanroy
Copy link
Collaborator

BramVanroy commented Aug 23, 2020

@chiyuzhang94 You probably have a program killer running. This is a background process that monitors the memory usage of the individual processes. If the system is about to run out of memory, it will kill the abusive process. My hunch is that Colab uses something similar.

The high memory usage occurs because linecache reads as much of the file into memory as it can, to have the optimal experience. Not all OS's seem to like this - although I have not have had any issues with this approach on my systems.

Here's a good article: https://dev.to/rrampage/surviving-the-linux-oom-killer-2ki9

@chiyuzhang94
Copy link

@chiyuzhang94 You probably have a program killer running. This is a background process that monitors the memory usage of the individual processes. If the system is about to run out of memory, it will kill the abusive process. My hunch is that Colab uses something similar.

The high memory usage occurs because linecache reads as much of the file into memory as it can, to have the optimal experience. Not all OS's seem to like this - although I have not have had any issues with this approach on my systems.

Here's a good article: https://dev.to/rrampage/surviving-the-linux-oom-killer-2ki9

Thanks, @BramVanroy.

I think it is hard for me to change the oom_score_adj because I need to submit a job to PBS job to run model.
I am wondering whether I can control the size of files that linecache reads. I think the size in function def _get_n_lines(fin, size=65536): is the controller. But it still doesn't work if I decrease the size.

@BramVanroy
Copy link
Collaborator

BramVanroy commented Aug 24, 2020

@chiyuzhang94 No, that function is not related to the caching. It is a function that very quickly can read through files to figure out how many lines there are in that file. The size is the chunks in bytes to read sequentially, which is much faster than reading line-per-line. But again, nothing to do with caching.

One option that I can think of, is allowing for an argument max_memory_usage, that will check at every __getitem__ call the current memory usage (either system memory usage or current script memory usage), and if the memory usage is more than max_memory_usage the script should call linecache.clearcache(). This will be slow when you have little memory or a low max value, but it should work.

@chiyuzhang94
Copy link

@chiyuzhang94 No, that function is not related to the caching. It is a function that very quickly can read through files to figure out how many lines there are in that file. The size is the chunks in bytes to read sequentially, which is much faster than reading line-per-line. But again, nothing to do with caching.

One option that I can think of, is allowing for an argument max_memory_usage, that will check at every __getitem__ call the current memory usage (either system memory usage or current script memory usage), and if the memory usage is more than max_memory_usage the script should call linecache.clearcache(). This will be slow when you have little memory or a low max value, but it should work.

Thanks, @BramVanroy ,

I tried your suggestion:

def __getitem__(self, idx):
        # Basic Memory checking from https://stackoverflow.com/a/48397534
        with open ('/proc/self/status') as f:
            memusage = f.read().split('VmRSS:')[1].split('\n')[0][:-3]

        logger.info(" memusage each time: %s", memusage)
        # If our memory usage exceeds a limit flush the cache to prevent OOM situations
        if int(memusage.strip()) > self.max_memory_usage and self.max_memory_usage > 0:
            logger.info(" memusage before: %s", memusage)
            linecache.clearcache()
            logger.info(" memusage after: %s", memusage)

        # linecache starts counting from one, not zero, +1 the given index
        return linecache.getline(self.file_path, idx + 1).rstrip()

But I found the linecache.clearcache() doesn't help based on the log.

Iteration:   0%|          | 0/1097530 [00:00<?, ?it/s]�[AI0826 18:51:17.077926 47405170347712 
I0826 18:51:17.080428 47405170347712 ARC_run_language_modeling_emohash.py:166]  memusage before: 	38945572
I0826 18:51:17.081127 47405170347712 ARC_run_language_modeling_emohash.py:169]  memusage after: 	38945572
I0826 18:51:27.666305 47348526792384 ARC_run_language_modeling_emohash.py:162]  memusage each time: 	39182488
I0826 18:51:27.670411 47348526792384 ARC_run_language_modeling_emohash.py:166]  memusage before: 	39182488
I0826 18:51:27.670989 47348526792384 ARC_run_language_modeling_emohash.py:169]  memusage after: 	39182488
I0826 18:51:43.620446 47109816241856 ARC_run_language_modeling_emohash.py:162]  memusage each time: 	39184224
I0826 18:51:43.620970 47109816241856 ARC_run_language_modeling_emohash.py:166]  memusage before: 	39184224
I0826 18:51:43.621682 47109816241856 ARC_run_language_modeling_emohash.py:169]  memusage after: 	39184224
I0826 18:51:49.295235 47667525713600 ARC_run_language_modeling_emohash.py:162]  memusage each time: 	38993432
I0826 18:51:49.295728 47667525713600 ARC_run_language_modeling_emohash.py:166]  memusage before: 	38993432
I0826 18:51:49.296677 47667525713600 ARC_run_language_modeling_emohash.py:169]  memusage after: 	38993432

Then, the job was killed.

I noticed I am using distributed training where each node has 4 GPUs. Since each of the 4 python threads eventually reads the entire file (90GB) into memory the dataset would take up over 360GB per node if they fully loaded the dataset. But each node only have 186GB RAM.

Do you have any suggestion to limit the caching size?

@shizhediao shizhediao mentioned this pull request Aug 31, 2020
@shizhediao
Copy link

any progress? @GCHQResearcher92457

@chiyuzhang94
Copy link

Hi @BramVanroy @GCHQResearcher92457 ,

I found a point that might be causing memory issues in the code (https://github.com/GCHQResearcher92457/transformers/blob/lazy-text-dataset-loading-for-lm/examples/run_language_modeling.py).

In the main function, the rank 1-3 threads will all stop at the barrier at line 770 and rank 0 will progress and load the model and vocab it will then hit line 825 and release the barrier. Once the barrier is released threads 1-3 will process the lines 770-825 (load model in the main function). Same for line 832-837 (load dataset).

I have four GPUs at each node. Hence, the rank 1-3 load the model and dataset from disk individually instead of using a copy from rank 0. This leads to the OOM issue.

I think the rank 1-3 threads should not run the line 832-837 again once the barrier released. But I added some log found: When a process hits a barrier is simply waits at that spot in the code until all other processes have hit a barrier. Then when it releases it continues from the point it is within the code, not jumping to the latest barrier.

I tried to add an if condition at line 770. This only allows rank 0 to load the model. But I got a new error. That shows the variables are not synchronized across devices. Rank 1-3 cannot get variable model.

Did you notice this issue? Do you have any suggestions?

@BramVanroy
Copy link
Collaborator

@chiyuzhang94 I am not sure why the memory is not clearing after using clearcache. It might be that you still have to call the garbage collector after clearing the cache, you can try that.

It is true that I had not thought about multinode support so you will indeed have multiple in-memory caches for each process. I do not think it is easy to by-pass that, unless by turning around all of the code and instead working with a dedicated reading process, which is a separate process that fetches lines from the data file.

As has been said before, though, it is now recommended to switch over to https://github.com/huggingface/nlp which allows for on-disk datasets which are fast and have a low memory footprint.

@julien-c
Copy link
Member

julien-c commented Sep 8, 2020

@BramVanroy (For the sake of discussion)

Wouldn't it be reasonably easy to enable (non-cached) random access to the text file(s) by storing a list of the positions of "\n" and then doing fseeks on the fly (ideally, using a sampler that yields batches of sequential lines, so that one batch needs only one file read)?

@BramVanroy
Copy link
Collaborator

@BramVanroy (For the sake of discussion)

Wouldn't it be reasonably easy to enable (non-cached) random access to the text file(s) by storing a list of the positions of "\n" and then doing fseeks on the fly (ideally, using a sampler that yields batches of sequential lines, so that one batch needs only one file read)?

Shouldn't be too hard to implement indeed, although my fear is that this might not be fast enough from an IO perspective. That is perhaps the trade-off that one would want to make, though, so it might be worth it.

You'd still need to make sure that all data is actually used, so in a shuffle setting this might not be straightforward if you want batches of consistent size. Perhaps depending on the number of lines, you can create a list of indexes that have batch_size distance between them (e.g. 0, 64, 128, 256), and then shuffle those indexes and at each iteration select one randomly that has not been seen yet. Then select batch_size lines starting from that index. That, in combination with your suggestion of getting the positions of \n should work indeed!

I am not sure whether I want to put time into this, though, seeing that nlp is the preferred way to go.

@chiyuzhang94
Copy link

chiyuzhang94 commented Sep 8, 2020

@chiyuzhang94 I am not sure why the memory is not clearing after using clearcache. It might be that you still have to call the garbage collector after clearing the cache, you can try that.

It is true that I had not thought about multinode support so you will indeed have multiple in-memory caches for each process. I do not think it is easy to by-pass that, unless by turning around all of the code and instead working with a dedicated reading process, which is a separate process that fetches lines from the data file.

As has been said before, though, it is now recommended to switch over to https://github.com/huggingface/nlp which allows for on-disk datasets which are fast and have a low memory footprint.

Hi @BramVanroy ,

Thanks for your suggestion.

I looked at the nlp tool.
I didn't find an example of loading a text file for LM pre-training.
I adapted the dataset loading class like this:

class DatasetNLP(Dataset):
    def __init__(self, filename, cache_dir, args):
        self.dataset = load_dataset('text', data_files= filename, cache_dir=cache_dir)["train"]["text"]

    def __len__(self):
        return len(self.dataset)

    def __getitem__(self, index):
        line = self.dataset[index]
        return line

I am wondering whether this is the optimal way to use nlp with PyTorch dataloader.

@mrm8488
Copy link
Contributor

mrm8488 commented Sep 8, 2020

I used that approach in my way to train a LM (RoBERTa like) from scratch. I didn't modified the dataloader. It works for some iterations but it ends sooner than later with kind of CUBLAS ERROR

@thomwolf thomwolf closed this Sep 9, 2020
@thomwolf thomwolf reopened this Sep 9, 2020
@thomwolf
Copy link
Member

thomwolf commented Sep 9, 2020

I'll start updating the examples to use the datasets library as soon as our new nlp release is out (probably today).

Your example @chiyuzhang94 is ok but by doing self.dataset = load_dataset('text', data_files= filename, cache_dir=cache_dir)["train"]["text"] you are loading all the dataset in RAM which is too bad because nlp can do memory mapping from drive.

You can directly use the dataset in a data loader by using set_format(type='torch'). More information is here: https://huggingface.co/nlp/master/quicktour.html#formatting-the-dataset

@BramVanroy
Copy link
Collaborator

I'll start updating the examples to use the datasets library as soon as our new nlp release is out (probably today).
Your example @chiyuzhang94 is ok but by doing self.dataset = load_dataset('text', data_files= filename, cache_dir=cache_dir)["train"]["text"] you are loading all the dataset in RAM which is too bad because nlp can do memory mapping from drive.
You can directly use the dataset in a data loader by using set_format(type='torch'). More information is here: https://huggingface.co/nlp/master/quicktour.html#formatting-the-dataset

Hi, I was wondering is it possible to finish the lazydataloader today?
I am a little bit eager for this function.
I would really appreciate your help. Thanks!

No, that is not possible. You cannot expect a company to open-source a great product and at the same time implementing features within the day.

As said numerous times in this topic, try out the nlp repository instead. It will help you out with any memory issues that you might have.

@chiyuzhang94
Copy link

I'll start updating the examples to use the datasets library as soon as our new nlp release is out (probably today).

Your example @chiyuzhang94 is ok but by doing self.dataset = load_dataset('text', data_files= filename, cache_dir=cache_dir)["train"]["text"] you are loading all the dataset in RAM which is too bad because nlp can do memory mapping from drive.

You can directly use the dataset in a data loader by using set_format(type='torch'). More information is here: https://huggingface.co/nlp/master/quicktour.html#formatting-the-dataset

Hi @thomwolf ,
Thanks for your suggestion.

I tried to implement this to load my text file. This test.txt is a simple sample where each line is a sentence.

dataset = load_dataset('text', data_files='test.txt',cache_dir="./")
dataset.set_format(type='torch',columns=["text"])
dataloader = torch.utils.data.DataLoader(dataset, batch_size=8)
next(iter(dataloader))

But dataload cannot yield sample and error is:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-28-388aca337e2f> in <module>
----> 1 next(iter(dataloader))

/Library/Python/3.7/site-packages/torch/utils/data/dataloader.py in __next__(self)
    343 
    344     def __next__(self):
--> 345         data = self._next_data()
    346         self._num_yielded += 1
    347         if self._dataset_kind == _DatasetKind.Iterable and \

/Library/Python/3.7/site-packages/torch/utils/data/dataloader.py in _next_data(self)
    383     def _next_data(self):
    384         index = self._next_index()  # may raise StopIteration
--> 385         data = self._dataset_fetcher.fetch(index)  # may raise StopIteration
    386         if self._pin_memory:
    387             data = _utils.pin_memory.pin_memory(data)

/Library/Python/3.7/site-packages/torch/utils/data/_utils/fetch.py in fetch(self, possibly_batched_index)
     42     def fetch(self, possibly_batched_index):
     43         if self.auto_collation:
---> 44             data = [self.dataset[idx] for idx in possibly_batched_index]
     45         else:
     46             data = self.dataset[possibly_batched_index]

/Library/Python/3.7/site-packages/torch/utils/data/_utils/fetch.py in <listcomp>(.0)
     42     def fetch(self, possibly_batched_index):
     43         if self.auto_collation:
---> 44             data = [self.dataset[idx] for idx in possibly_batched_index]
     45         else:
     46             data = self.dataset[possibly_batched_index]

KeyError: 0

dataset.set_format(type='torch',columns=["text"]) returns a log says:
Set __getitem__(key) output type to torch for ['text'] columns (when key is int or slice) and don't output other (un-formatted) columns.

I noticed the dataset is DatasetDict({'train': Dataset(features: {'text': Value(dtype='string', id=None)}, num_rows: 44)}).
Each sample can be accessed by dataset["train"]["text"].

I don't know how to modify this code to load the text file. Could you please give me any suggestions?

@BramVanroy
Copy link
Collaborator

BramVanroy commented Sep 10, 2020

@chiyuzhang94 Can you please ask your question either on the forums or on the respective repository? Your question is not a transformers question anymore, nor should PRs be used for general questions like this.

@chiyuzhang94
Copy link

@chiyuzhang94 Can you please ask your question either on the forums or on the respective repository? Your question is not a transformers question anymore, nor should PRs be used for general questions like this.

Sure. Thanks for your investigation. I posted this question here: huggingface/datasets#610 (comment). @BramVanroy @thomwolf

@stale
Copy link

stale bot commented Nov 9, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix label Nov 9, 2020
@stale stale bot closed this Nov 21, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.