MEVZU N°128ISTANBULYEAR I — VOL. III
MEVZU N° TAG / VOL. 156
#tokenization
0 blog · 0 news · 4 wiki
§03
04Wiki
§01Glossary
BPE — Byte-Pair Encoding
A tokenisation algorithm that builds a sub-word vocabulary by iteratively merging the most frequent character pairs.
- EN
- Byte-Pair Encoding (BPE)
- TR
- BPE — Bayt Çifti Kodlama
§02Glossary
WordPiece
Google's likelihood-driven sub-word algorithm, similar in spirit to BPE and used by BERT.
- EN
- WordPiece
- TR
- WordPiece
§03Glossary
SentencePiece
Google's language-agnostic tokeniser library that treats whitespace as just another character.
- EN
- SentencePiece
- TR
- SentencePiece
§04Glossary★
Token
The smallest unit a language model processes — a word fragment, character, or symbol.
- EN
- Token
- TR
- Token