• GLUE, the General Language Understanding Evaluation benchmark.
  • Wikitext: Contains text extracted from wikipedia.
  • IMDB: This dataset is suitable for text binary classification.
  • Yelp review: This dataset is ideal for text multi-classification.
  • Text REtrieval Conference (TREC): This dataset is for question classification.
  • AG news: This dataset is suitable for the topic classification dataset. It contains 1 million news and their corresponding topic as labels. The labels fall into 5 classes.
  • DPpedia 14: This dataset contains a subset of the DBpedia dataset.