Code Switch

CodeSwitch is a NLP tool, can use for language identification, pos tagging, name entity recognition, sentiment analysis of code mixed data.

Supported Code-Mixed Language

We used LinCE dataset for training multilingual BERT model using huggingface transformers. LinCE has four language mixed data. We took three of it spanish-english, hindi-english and nepali-english. Hope we will train and add other language and task too.

  • Spanish-English(spa-eng)
  • Hindi-English(hin-eng)
  • Nepali-English(nep-eng)

Language Code

  • spa-eng for spanish-english
  • hin-eng for hindi-english
  • nep-eng for nepali-english

Installation

Dependency

  • pytorch >=1.6.0

Features & Supported Language

  • Language Identification
    • spanish-english
    • hindi-english
    • nepali-english
  • POS
    • spanish-english
    • hindi-english
  • NER
    • spanish-english
    • hindi-english
  • Sentiment Analysis
    • spanish-english

Language Identification

from codeswitch.codeswitch import LanguageIdentification
lid = LanguageIdentification('spa-eng')
# for hindi-english use 'hin-eng',
# for nepali-english use 'nep-eng'
text = "" # your code-mixed sentence
result = lid.identify(text)
print(result)

POS Tagging

from codeswitch.codeswitch import POS
pos = POS('spa-eng')
# for hindi-english use 'hin-eng'
text = "" # your mixed sentence
result = pos.tag(text)
print(result)

NER Tagging

from codeswitch.codeswitch import NER
ner = NER('spa-eng')
# for hindi-english use 'hin-eng'
text = "" # your mixed sentence
result = ner.tag(text)
print(result)

Sentiment Analysis

from codeswitch.codeswitch import SentimentAnalysis
sa = SentimentAnalysis('spa-eng')
sentence = "" # your mixed sentence
result = sa.analyze(sentence)
print(result)

Acknowledgement