This dataset typically contains extracted from German Wikipedia . It is widely used by researchers for tasks such as:
: Approximately 100,000 documents with titles, tables, and images removed to provide clean, plain text.
: Building a set of unique German words or tokens for language modeling.
This dataset typically contains extracted from German Wikipedia . It is widely used by researchers for tasks such as:
: Approximately 100,000 documents with titles, tables, and images removed to provide clean, plain text. Germany 100k.zip
: Building a set of unique German words or tokens for language modeling. 000 documents with titles