Last modified: Jan 10, 2023 By Alexander Williams
Check if a String's Language is English in Python
Hello
In this tutorial, I will show you how to check the language used in the sting.
And to do that, we need to work with the Googletrans library.
Googletrans is a Google API library that provides Google translate futures like translating, detecting..., and in our case, we'll use the detect() method.
Let's get started
Installing Googletrans
Install via pip:
pip install googletrans==3.1.0a0
How to use the detect() method
The detect() method returns the language of the text and the confidence.
Let me show you how to use it.
from googletrans import Translator
detector = Translator()
dec_lan = detector.detect('이 문장은 한글로 쓰여졌습니다.')
print(dec_lan)
Output
Detected(lang=ko, confidence=1.0)
As you can see, the method detected ko language with 1.0 confidence.
confidence is between 0.1 to 1.0
Print the language:
print(dec_lan.lang)
Output:
ko
Print the confidence:
print(dec_lan.confidence)
Output:
1.0
detect multiple strings:
sentences = ["I see cats", "bounjour mon chat"]
dec_lan = detector.detect(sentences)
for dec in dec_lan:
print(dec.lang)
Output:
Detected(lang=en, confidence=1.0)
Detected(lang=fr, confidence=1.0)
If you want to show the full language name, you need to follow these steps:
first, Define a dictionary that contains languages with code:
LANGUAGES = {
'af': 'afrikaans',
'sq': 'albanian',
'am': 'amharic',
'ar': 'arabic',
'hy': 'armenian',
'az': 'azerbaijani',
'eu': 'basque',
'be': 'belarusian',
'bn': 'bengali',
'bs': 'bosnian',
'bg': 'bulgarian',
'ca': 'catalan',
'ceb': 'cebuano',
'ny': 'chichewa',
'zh-cn': 'chinese (simplified)',
'zh-tw': 'chinese (traditional)',
'co': 'corsican',
'hr': 'croatian',
'cs': 'czech',
'da': 'danish',
'nl': 'dutch',
'en': 'english',
'eo': 'esperanto',
'et': 'estonian',
'tl': 'filipino',
'fi': 'finnish',
'fr': 'french',
'fy': 'frisian',
'gl': 'galician',
'ka': 'georgian',
'de': 'german',
'el': 'greek',
'gu': 'gujarati',
'ht': 'haitian creole',
'ha': 'hausa',
'haw': 'hawaiian',
'iw': 'hebrew',
'he': 'hebrew',
'hi': 'hindi',
'hmn': 'hmong',
'hu': 'hungarian',
'is': 'icelandic',
'ig': 'igbo',
'id': 'indonesian',
'ga': 'irish',
'it': 'italian',
'ja': 'japanese',
'jw': 'javanese',
'kn': 'kannada',
'kk': 'kazakh',
'km': 'khmer',
'ko': 'korean',
'ku': 'kurdish (kurmanji)',
'ky': 'kyrgyz',
'lo': 'lao',
'la': 'latin',
'lv': 'latvian',
'lt': 'lithuanian',
'lb': 'luxembourgish',
'mk': 'macedonian',
'mg': 'malagasy',
'ms': 'malay',
'ml': 'malayalam',
'mt': 'maltese',
'mi': 'maori',
'mr': 'marathi',
'mn': 'mongolian',
'my': 'myanmar (burmese)',
'ne': 'nepali',
'no': 'norwegian',
'or': 'odia',
'ps': 'pashto',
'fa': 'persian',
'pl': 'polish',
'pt': 'portuguese',
'pa': 'punjabi',
'ro': 'romanian',
'ru': 'russian',
'sm': 'samoan',
'gd': 'scots gaelic',
'sr': 'serbian',
'st': 'sesotho',
'sn': 'shona',
'sd': 'sindhi',
'si': 'sinhala',
'sk': 'slovak',
'sl': 'slovenian',
'so': 'somali',
'es': 'spanish',
'su': 'sundanese',
'sw': 'swahili',
'sv': 'swedish',
'tg': 'tajik',
'ta': 'tamil',
'te': 'telugu',
'th': 'thai',
'tr': 'turkish',
'uk': 'ukrainian',
'ur': 'urdu',
'ug': 'uyghur',
'uz': 'uzbek',
'vi': 'vietnamese',
'cy': 'welsh',
'xh': 'xhosa',
'yi': 'yiddish',
'yo': 'yoruba',
'zu': 'zulu',
}
Now let's detect the language of the string and print the full name of the language.
detector = Translator()
dec_lan = detector.detect('이 문장은 한글로 쓰여졌습니다.')
print(LANGUAGES[dec_lan.lang])
Output:
korean
Check if a string is English
dec_lan = detector.detect('Googletrans is a free and unlimited python library that implemented Google Translate API')
if dec_lan.lang == "en" and dec_lan.confidence == 1:
print('Yes! it is')
else:
print('No! it is not')
Output
Yes! it is