Last modified: February 26, 2021

Check if a String's Language is English in Python

Hello

In this tutorial, I will show you how to check the language that is used in the sting.
And to do that, we need to work with the Googletrans library.

Googletrans is a Google API library that provides Google translate futures like translating, detecting..., and in our case, we'll use the detect() method.

Let's get started

Installing Googletrans

Install via pip:


pip install googletrans==3.1.0a0

How to use the detect() method

The detect() method returns the language of the text and the confidence.

Let me show you how to use it.


from googletrans import Translator

detector = Translator()

dec_lan = detector.detect('이 문장은 한글로 쓰여졌습니다.')

print(dec_lan)

Output


Detected(lang=ko, confidence=1.0)

As you can see, the method detected ko language with 1.0 confidence.

confidence is between 0.1 to 1.0

Print the language:


print(dec_lan.lang)

Output:


ko

Print the confidence:


print(dec_lan.confidence)

Output:


1.0

You can also detect multiple strings:


sentences = ["I see cats", "bounjour mon chat"]
dec_lan = detector.detect(sentences)

for dec in dec_lan:
  print(dec.lang)

Output:


Detected(lang=en, confidence=1.0)
Detected(lang=fr, confidence=1.0)

If you want to show the full language name, you need to follow these steps:

Define a dictionary that contains languages with code:


LANGUAGES = {
    'af': 'afrikaans',
    'sq': 'albanian',
    'am': 'amharic',
    'ar': 'arabic',
    'hy': 'armenian',
    'az': 'azerbaijani',
    'eu': 'basque',
    'be': 'belarusian',
    'bn': 'bengali',
    'bs': 'bosnian',
    'bg': 'bulgarian',
    'ca': 'catalan',
    'ceb': 'cebuano',
    'ny': 'chichewa',
    'zh-cn': 'chinese (simplified)',
    'zh-tw': 'chinese (traditional)',
    'co': 'corsican',
    'hr': 'croatian',
    'cs': 'czech',
    'da': 'danish',
    'nl': 'dutch',
    'en': 'english',
    'eo': 'esperanto',
    'et': 'estonian',
    'tl': 'filipino',
    'fi': 'finnish',
    'fr': 'french',
    'fy': 'frisian',
    'gl': 'galician',
    'ka': 'georgian',
    'de': 'german',
    'el': 'greek',
    'gu': 'gujarati',
    'ht': 'haitian creole',
    'ha': 'hausa',
    'haw': 'hawaiian',
    'iw': 'hebrew',
    'he': 'hebrew',
    'hi': 'hindi',
    'hmn': 'hmong',
    'hu': 'hungarian',
    'is': 'icelandic',
    'ig': 'igbo',
    'id': 'indonesian',
    'ga': 'irish',
    'it': 'italian',
    'ja': 'japanese',
    'jw': 'javanese',
    'kn': 'kannada',
    'kk': 'kazakh',
    'km': 'khmer',
    'ko': 'korean',
    'ku': 'kurdish (kurmanji)',
    'ky': 'kyrgyz',
    'lo': 'lao',
    'la': 'latin',
    'lv': 'latvian',
    'lt': 'lithuanian',
    'lb': 'luxembourgish',
    'mk': 'macedonian',
    'mg': 'malagasy',
    'ms': 'malay',
    'ml': 'malayalam',
    'mt': 'maltese',
    'mi': 'maori',
    'mr': 'marathi',
    'mn': 'mongolian',
    'my': 'myanmar (burmese)',
    'ne': 'nepali',
    'no': 'norwegian',
    'or': 'odia',
    'ps': 'pashto',
    'fa': 'persian',
    'pl': 'polish',
    'pt': 'portuguese',
    'pa': 'punjabi',
    'ro': 'romanian',
    'ru': 'russian',
    'sm': 'samoan',
    'gd': 'scots gaelic',
    'sr': 'serbian',
    'st': 'sesotho',
    'sn': 'shona',
    'sd': 'sindhi',
    'si': 'sinhala',
    'sk': 'slovak',
    'sl': 'slovenian',
    'so': 'somali',
    'es': 'spanish',
    'su': 'sundanese',
    'sw': 'swahili',
    'sv': 'swedish',
    'tg': 'tajik',
    'ta': 'tamil',
    'te': 'telugu',
    'th': 'thai',
    'tr': 'turkish',
    'uk': 'ukrainian',
    'ur': 'urdu',
    'ug': 'uyghur',
    'uz': 'uzbek',
    'vi': 'vietnamese',
    'cy': 'welsh',
    'xh': 'xhosa',
    'yi': 'yiddish',
    'yo': 'yoruba',
    'zu': 'zulu',
}

Now let's detect the language of the string and print the full name of the language.


detector = Translator()

dec_lan = detector.detect('이 문장은 한글로 쓰여졌습니다.')

print(LANGUAGES[dec_lan.lang])

Output:


korean

Check if a string is the English language


dec_lan = detector.detect('Googletrans is a free and unlimited python library that implemented Google Translate API')

if dec_lan.lang == "en" and dec_lan.confidence == 1:
    print('Yes! it is')
else:
    print('No! it is not')

Output


Yes! it is