I’m working as an Applied Research Scientist at Amazon with India Machine Learning team.
Previously, I have worked as a Machine Learning Engineer at Jio Haptik on fundamental Conversational-AI problems using Deep Learning. I built the Intent Detection System for Haptik’s NLU Engine, which was 25% more accurate than their previous system, owning it from Research to Production.
I have authored research papers which have been accepted at top tier venues like EMNLP NLP-OSS workshop, EMNLP Insights workshop, EACL LT-EDI workshop and FIRE.
I am also the creator of open source iNLTK library which provides out of the box support for various NLP tasks, for low resource 13 Indic Languages. The library has 70,000+ downloads, 700+ stars and 100+ forks on GitHub.
Prior to Jio Haptik, I worked at Goldman Sachs with the User Experience and Productivity team on Analytics for Desktop Assistant, which is firm-wide used productivity tool.
I have Advanced Certification in AI and ML from International Institute of Information Technology (IIIT-Hyderabad) and Bachelor’s in Computer Science from PEC University of Technology.
I am interested in the applications of Machine Learning to solve problems which will impact millions and keep making my little open source contributions towards it.
Accepted at EMNLP-2020’S NLP-OSS workshop |
iNLTK: Natural Language Toolkit for Indic Languages
Gaurav Arora [Paper] [GitHub] |
Accepted at EMNLP-2020’S Insights workshop |
HINT3: Raising the bar for Intent Detection in the Wild
Gaurav Arora, Chirag Jain, Manas Chaturvedi, Krupal Modi [Paper] [GitHub] |
Accepted at Dravidian Codemix HASOC @ FIRE-2020 |
Pre-training ULMFiT on Synthetically Generated Code-Mixed Data for Hate Speech Detection
Gaurav Arora [Paper] [GitHub] |
Accepted at LT-EDI @ EACL-2021 |
Spartans@ LT-EDI-EACL2021: Inclusive Speech Detection using Pretrained Language Models
Gaurav Arora*, Megha Sharma* [Paper] [GitHub] |
2018 - 2019 |
Advanced Certification in Artificial Intelligence and Machine Learning
International Institute of Information Technology (IIIT-Hyderabad) |
2014 - 2018 |
B.Tech in Computer Science
PEC University of Technology |
2012 - 2014 | GMSSS-16, Chandigarh |
Apr 2021 - Present | Amazon, Applied Scientist |
July 2019 - Apr 2021 | Jio Haptik, Machine Learning Engineer |
June 2018 - July 2019 | Goldman Sachs, Technology Analyst |
May 2017 - Oct 2017 | Goldman Sachs, Technology Analyst Intern |
Nov 2016 - Mar 2018 | Researchshala, Co-Founder and CTO |
![]() |
Natural Language Toolkit for Indic Languages (iNLTK)Star Fork Watch• iNLTK aims to provide out of the box support for various NLP tasks that an application developer might need for Indic languages • iNLTK provides Data Augmentation, Sentence Similarity, Sentence Encoding, Word Embedding, Tokenization and Text Generation utilities for low resource 13 Indic Languages • The library is backed by ULMFiT Language Models which I had trained using Fastai and Pytorch libraries, producing SOTA LM perplexity and Classification accuracy in 13 Indic Languages Appreciation for iNLTK • By Jeremy Howard, Sebastian Ruder on Twitter • Shared a lot by community on LinkedIn • iNLTK has 70,000+ Downloads on PyPi • Data Augmentation post about iNLTK was trending on LinkedIn • iNLTK was trending on GitHub in May, 2019 • Shared on Reddit, Facebook, Quora etc by the community |
![]() |
Code with AIStar Fork Watch• Tool which predicts which techniques one should use to solve a competitive programming problem to get correct answer • Demo video on YouTube Appreciation for Code with AI • By Jeremy Howard on Twitter • By community on Codeforces • The tool has been used by 3000+ users |
![]() |
NLP for HindiStar Fork Watch• Contains SOTA Language models and Classifier for Hindi • Pretrained Models available for download: TransformerXL, ULMFiT [ Code ] [ Results ] [ Dataset ] [ Embeddings projection ] |
![]() |
NLP for SanskritStar Fork Watch• Contains SOTA Language models and Classifier for Sanskrit • Pretrained Models available for download: TransformerXL, ULMFiT [ Code ] [ Results ] [ Dataset ] [ Embeddings projection ] |
![]() |
NLP for NepaliStar Fork Watch• Contains SOTA Language models and Classifier for Nepali • Pretrained Models available for download: TransformerXL, ULMFiT [ Code ] [ Results ] [ Dataset ] [ Embeddings projection ] |
![]() |
NLP for TamilStar Fork Watch• Contains SOTA Language models and Classifier for Tamil • Pretrained Models available for download: TransformerXL, ULMFiT [ Code ] [ Results ] [ Dataset ] [ Embeddings projection ] |
![]() |
NLP for BengaliStar Fork Watch• Contains SOTA Language models and Classifier for Bengali • Pretrained Models available for download: TransformerXL, ULMFiT [ Code ] [ Results ] [ Dataset ] [ Embeddings projection ] |
![]() |
NLP for PunjabiStar Fork Watch• Contains SOTA Language models and Classifier for Punjabi • Pretrained Models available for download: TransformerXL, ULMFiT [ Code ] [ Results ] [ Dataset ] [ Embeddings projection ] |
![]() |
NLP for MalayalamStar Fork Watch• Contains SOTA Language models and Classifier for Malayalam • Pretrained Models available for download: TransformerXL, ULMFiT [ Code ] [ Results ] [ Dataset ] [ Embeddings projection ] |
![]() |
NLP for OdiaStar Fork Watch• Contains SOTA Language models and Classifier for Odia • Pretrained Models available for download: TransformerXL, ULMFiT [ Code ] [ Results ] [ Dataset ] [ Embeddings projection ] |
![]() |
NLP for GujaratiStar Fork Watch• Contains SOTA Language models and Classifier for Gujarati • Pretrained Models available for download: TransformerXL, ULMFiT [ Code ] [ Results ] [ Dataset ] [ Embeddings projection ] |
Mar 2021 | Indian Achievers Award 2020 from Indian Achiever’s Forum (IAF) in the Young Achievers Category for contribution in nation building through iNLTK |
Mar 2019 | Fast.ai International Fellow for contributions to Fast.ai forums |
Dec 2018 | Top-17% rank in Human Protein Atlas Image Classification, Kaggle for developing Deep Learning model which classified mixed patterns of proteins in microscope images. The competition had 2172 teams, but I participated individually and hence had 100% contribution in the 366th placed solution |
Oct 2017 | 1st Prize in IEEE-Hackathon for developing chat-bot to help people with emotional decisions in life |
Feb 2016 | Top-100 among 500,000 students in IT-Olympiad,2016. |
Oct 2016 | 2nd-Prize in IEEE-Hackathon for developing an Augmented reality application to help teachers |
Mar 2016 | All India Rank-6 in IEEE Programming League, among over 1200 undergraduate students |
Mar 2016 | 2nd Rank, CodeWars,a competitive-programming event hosted by IEEE,PEC on CodeChef |
Nov 2016 - Mar 2018 | Research Scholarship of 10k per month for Personal Emotional Doctor - Bot |
May 2014 | All India Rank-885 in JEE-Mains, among 1.4 million candidates |
Aug 2014 | 1st Rank-Opener, PEC for best JEE-Mains rank among 600 students of the session 2014-2018 |
Dec 2014 | 1 Lakh Scholarship from CBSE for 96.4% marks in 12th Boards and 10 CGPA in 10th |
Dec 2014 | Letter of Appreciation from HRD Ministry,Govt. of India for 96.4% in CBSE-12th exams |
June 2011 | Catch Them Young - was among the top-40 students selected from tricity by INFOSYS for 2-week Programming-Basics training on their campus |
Mathematics |
Discrete Structures for Computer Science, Vector Calculus, Fourier Series and Laplace Transform, Operation Research,Mathematics for Machine Learning: Linear Algebra and Multivariate Calculus (Coursera) |
Computer Science |
Data Structures and Algorithms, Computer Architecture and Organization, OOP, Microprocessor, DBMS, OperatingSystems, Computer Networks, Theory of Computation, Artificial Intelligence, Computer Graphics, Mobile Computing, Fastai: Part 1 and Part 2, DeepLearning.ai by Andrew Ng, Deep Learning by Prof. Mitesh Khapra, IIT Madras |
Programming & Web |
C, C++, Python, Javascript, TypeScript, EcmaScript6, AngularJS, ReactJS, Angular4, Webpack, Django with Python |
Frameworks |
Pytorch, Pandas, Numpy, ScikitLearn, SciPy, Fastai, Transformers library |
Last updated on 2021-10-03