Hi, my name is
Faisal Amin
I am a German national who has recently returned to Germany after completing a master's degree in Singapore. With fluency in both German and English, I bring a diverse perspective honed through international experience. Currently based in Darmstadt, Germany, I am eager to leverage my skills and relocate for the right career opportunity. My academic and professional background, combined with hands-on projects and a passion for continuous learning, have equipped me with a unique blend of technical expertise and adaptability to thrive in dynamic environments.
Education
National University of Singapore (NUS)
SingaporeMaster of Computing - Artificial Intelligence Specialisation
Aug 2022 – June 2024- GPA: 4.7/5.0
- Awards: Dean’s List – Top 5% of Cohort in Academic Performance
- Relevant Modules: Neural Networks and Deep Learning (I and II), Natural Language Processing (A+), Text Mining (A+), Uncertainty Modelling in AI (A+), Big Data Analytics (A+), Applied Analytics (A+)
University of Durham
Durham, United KingdomBachelor of Science in Computer Science
Oct 2017 - Aug 2020- GPA: 4.0/4.0 (Graduated with First Class Honours)
- Awards: Outstanding Achievement Distinction, BCS Chartered Institute for IT Prize - Top 5 Student for Academic Achievement among Graduating BSc and MEng Students
ISF International School Frankfurt Rhein-Main
Frankfurt, GermanyBilingual Diploma of the International Baccalaureate
Aug 2015 - July 2017- Final Scoring: 42/45 (Equivalent to 1.0 Abitur)
ISF International School Frankfurt Rhein-Main
Frankfurt, GermanyHigh School Diploma with Distinction
Aug 2005 - June 2017- GPA: 3.94 / 4.0
Work Experience
Machine Learning Engineer
SingaporeSavvy - NUS Social Impact Catalyst Nonprofit Organisation
Jan 2024 - May 2024- Integrated a RAG Chatbot with Python and LangChain, leveraging Pinecone for the vector database, into an application offering a lesson-based curriculum for elderly users to acquire digital literacy skills
- Conducted comprehensive testing of local, API-based and AWS SageMaker-based Large Language Models with regards to scalability, pricing, reliability and performance
- Developed classical and neural network models to analyse and interpret user data as well as evaluate pacing and effectiveness of the learning program
- Synthesized a learning algorithm for the curriculum roadmap and fine-tuned reward weights based on the aforementioned user data analytics to improve user engagement and retention rates
NLP Teaching Assistant
SingaporeNUS CS5246 Text Mining
Jan 2024 – May 2024- Mentored and provided tailored after-class support to over 100 students in a graduate-level NLP course
- Crafted nuanced assignments and projects to gauge student comprehension and mastery of NLP principles
- Administered final exams and quizzes while accurately managing logistics and grading standards
Research Assistant
SingaporeNUS Asian Institute of Digital Finance – Credit Research Initiative
Oct 2023 – Jan 2024- Curated robust datasets by scraping international stock exchange announcements with Python and Selenium
- Employed NLP and information extraction techniques to parse relevant financial websites and documents
- Utilized LLMs like OpenAI GPT 3.5 and Llama 2 to classify default events for publicly traded companies
Data Analyst
Darmstadt, GermanyAM Group
May 2023 - July 2023, Jan 2021 - July 2022- Analysed delivery data within delivery zones to optimize distribution of coupons using PostgreSQL, lowering number of coupons distributed by about 35% with a profit margin increase of up to 20%
- Conducted analysis of energy costs via Python and PostgreSQL, identifying periods of exceptional fluctuations in electric expenses which led to a 14% reduction in electric costs
- Systemized a weekly delivery sales comparison chart with Microsoft Excel to track restaurant performances, highlighting underperforming restaurants and visualizing their ranking against competitor restaurants
- Compiled data from 100+ delivery vehicles and regularly presented findings to restaurant managers and stakeholders as part of our Digitalization Project using interactive dashboards in Tableau
Software Engineering Intern
Darmstadt, GermanySoftware AG
July 2019 - Sep 2019- Developed an education package comprised of JavaServer Pages to upskill users about Internet of Things
- Managed it with OpenCMS and constructed in-house usability tests for maintenance and iterative upgrading
- Facilitated seamless communication of updates and suggestions to cross-functional and international teams
Completed Projects
Twitter Bot Detection
- Extracted and generated two robust datasets from a 90GB+ industry standard Twi-Bot 22 dataset
- Conducted extensive feature engineering, EDA and data visualizations to identify critical patterns of bots
- Developed two sets of multiple classifiers, one for unstructured and one for structured data, using both classical and deep neural network models, various text embedding strategies and bootstrapping
- Implemented stacking with majority voting for combining all model predictions to enhance model performance and robustness, achieving results approaching state-of-the-art results from the original paper
- Developed a healthcare chatbot companion utilizing a RAG-LLM setup with OpenAI, LangChain, and Pinecone vector databases to provide personalized medical assistance
- Incorporated over 1000 scraped medical articles from respected Singaporean health websites and encyclopaedias, ensuring comprehensive, relevant and up-to-date information for users
- Engineered a system that analyses user-uploaded medical reports, leverages top-k relevant knowledge from Pinecone, and incorporates conversation history to deliver personalized context for more accurate responses
- Orchestrated backend integration with Python and FastAPI as well as a user-friendly frontend with React
Mental Disorder Classification
- Conducted a multi-label classification deep learning task in Python to detect depression, anxiety and neutral sentiments in Reddit and Twitter posts via classical and neural network-based approaches
- Scraped over 40,000 Reddit posts via subreddit-based queries to train/evaluate the models and over one million Twitter tweets to evaluate transfer learning capabilities of the models on a different domain
- Created classical models like KNN and Logistic Regression, utilizing a TF-IDF representation with a maximum accuracy of 86% on Reddit data and 93% on Twitter data
- Built neural network models including MLP, CNN, RNN, Transformers and DeBERTa using fastText pretrained word embeddings with a maximum accuracy of 90% on Reddit data and 94% on Twitter data
- Completed a separate Fake News Detection Project as well on a pre-existing dataset in a similar manner with a larger focus on domain transfer learning and large language models like BERT and Electra
WiseGuard
- Led the development of WiseGuard, a full-stack project leveraging LLMs to empower Singapore's seniors with scam prevention and awareness.
- Utilized Python, OpenAI, and LangChain to integrate GPT 3.5, crafting realistic scam conversations and quizzes highlighting red flags and safety measures across six categories of scams.
- Implemented Flask in the backend for response validation and processing of LLM responses
- Engineered a user-friendly frontend using Django and Jinja, optimizing usability and accessibility, while deploying the application seamlessly with Docker and Render and managing user authentication through SQLite.
- Developed a modern portfolio website utilizing Next.js and Tailwind CSS
- Enhanced website interactivity and visual appeal by integrating Lottie Files, React Hot Toast, and other React libraries
- Implemented an inbuilt web form that can send me a personal email via Resend
- Ensured a highly responsive design for optimal user experience across all devices
- Deployed the website via Vercel with a custom domain
Grad Student Association of Computing Executive
- Led the organization and execution of online and physical hackathons while actively engaging with potential sponsors to secure support for events and networking opportunities for participants
- Fostered a sense of community via social events, facilitating connections among members and allowing for interdisciplinary discussions
- Conceptualized innovative challenges tailored to members' interests and skill levels, encouraging participation and collaboration during events
ConTra - Self-Supervised Contrastive Approach to Text Classification using Transformers
- Designed and implemented ConTra, a self-supervised contrastive learning approach using transformers for text classification tasks as an alternative to masked language modeling
- Explored and analyzed the effectiveness of 7 different text data augmentation techniques like synonym substitution, word deletion, and contextual replacements for generating positive and negative sample pairs
- Developed a transformer encoder model trained with a contrastive loss objective inspired by SimCLR, utilizing up to 4 chained augmentations to learn robust text representations
- Conducted extensive experiments comparing ConTra against DistilBERT, achieving competitive performance of less than 1% accuracy difference on a text classification dataset when pre-trained on limited data
Topic Modelling on Social Media Customer Service
- Performed data wrangling and preprocessing on a dataset of 3 million tweets using PySpark's parallel processing to group relevant tweets into customer support conversations
- Applied unsupervised learning techniques like K-Means clustering, Latent Semantic Analysis (LSA), Latent Dirichlet Allocation (LDA), and Non-Negative Matrix Factorization (NMF) to identify major topics in the Twitter customer support data
- Conducted comprehensive data analysis and generated insights per topic, including response time analysis, sentiment-based resolution/escalation rate studies, and identification of private vs urgent exchanges
- Provided data-driven recommendations to enhance customer service strategies on social media, such as optimizing resource allocation, improving response times, addressing dissatisfaction for time-sensitive issues, and increasing visibility of private support channels
Fake News Detection
- Designed and implemented 9 different deep learning models including MLPs, CNNs, LSTMs, Transformers, and pre-trained language models like BERT and RoBERTa for multi-label fake news classification
- Performed comprehensive data preprocessing, exploratory analysis, and augmentation techniques on two fake news datasets from different domains to enhance model performance
- Conducted extensive experiments to evaluate model effectiveness at solving the fake news detection task and transferring capabilities across domains, achieving state-of-the-art F1 scores
- Optimized model architectures, hyperparameters, and training strategies like multi-task learning, regularization, and early stopping to improve generalization and prevent overfitting
AWS PartyRock Generative AI Hackathon - Chimera Lab
- Leveraged AWS PartyRock to develop Chimera Lab, an educational AI tool for young learners where they combine animal body parts, triggering educational content and generating unique final creature descriptions and images
- Employed prompt engineering techniques within PartyRock to seamlessly integrate user choices, generating a cohesive creature description and accompanying image
AI Singapore National AI Student Challenge 2024
- Given a real-world dataset by PetFinder.my to derive actionable insights regarding pet adoption rates
- Conducted Data Cleaning, Exploratory Data Analysis, Feature Engineering, Feature Selection, Model Building, and Model Evaluation
- Effectively managed cleaning and merging of multiple datasets
- Tested and evaluated multiple classical machine learning models using a pipeline, cross-validation, and grid search
American Politics Classifier
- Developed multiple binary classification deep learning models capable of identifying pro-Trump and pro-Biden sentiments in Twitter tweets in Python
- Scraped over 40,000 tweets using hashtag-based queries and applied NLP preprocessing steps to build higher quality training/validation/testing datasets
- Formulated multiple models (MLP, CNN, LSTM, BERT) with different feature engineering techniques (bag-of-words, TF-IDF, and word embeddings) to achieve a maximum accuracy of 80%
- Created ChatCraft, a conversational simulation tool aimed at enhancing communication skills through AI-driven interactions.
- Designed and developed a user-friendly frontend interface using Streamlit, optimizing the user experience and accessibility of the application.
- Leveraged Python, LangChain, and OpenAI for backend development, enabling seamless integration of GPT 3.5 to simulate realistic conversations.
- Conceptualized and developed a full-stack event-to-volunteer matching application with personalized task generation
- Leveraged SQLAlchemy in Python to interact with PostgreSQL databases
- Utilized OpenAI GPT 3.5 and LangChain to generate personalized tasks for each matched event
- Utilized FastAPI for streamlined backend routing and Django with Jinja for frontend development
Flatland Challenge
- Implemented DQN-based reinforcement learning models (DQN, Double DQN, Dueling DQN, Dueling Double DQN) to solve the Vehicle Rescheduling Problem in the Flatland Challenge
- Constructed a multi-agent 2D grid world using Python to optimize agents navigating train networks
- Formulated optimal sparse reward functions to balance local single and global multi-agent reward signals
AI Art Hack 2023
- Leveraged ChatGPT 3.5 for innovative story plot generation
- Utilized Midjourney to generate relevant images, employing prompt engineering for stylization choices and ensuring consistency in story character portrayals
- Employed Canva to craft a visually appealing 2-page comic and inserting appropriate text bubbles
Movie Recommender System
- Performed exploratory data analysis and preprocessing on large user, movie, and ratings datasets to prepare data for building a movie recommender system
- Developed and optimized two machine learning approaches for movie recommendations: an ensemble SVM model and a multi-layer perceptron neural network, achieving high precision scores of 0.68 and 0.69 respectively
- Implemented multi-armed bandit algorithms with epsilon-greedy strategies and exploration functions to balance exploration and exploitation for personalized movie recommendations, attaining up to 80% overlap with most-liked movies
- Evaluated movie recommendation models using precision, recall, F1-score metrics, and analyzed strengths and limitations of machine learning vs multi-armed bandit approaches
Singlife Datathon 2023
- Analyzed a real-world anonymized dataset comprising 300+ client features to extract actionable insights aimed at enhancing customer experience and increasing insurance product sales
- Executed end-to-end data analysis pipeline encompassing data cleaning, exploratory data analysis, feature engineering, and model building, resulting in robust predictive models to drive business decisions
- Demonstrated proficiency in handling diverse missing or invalid data values within the dataset, ensuring data integrity and reliability throughout the analysis process
- Applied Synthetic Minority Over-sampling Technique (SMOTE) to address class imbalance challenges, enhancing the performance and generalization of predictive models
- Leveraged pipeline architecture, cross-validation, and grid search techniques to systematically evaluate and compare multiple classical machine learning models, optimizing model performance and scalability
Ongoing + Upcoming Projects
Knowledge Nexus
I will develop a personal research workspace, empowering users to effortlessly gather, analyze, and interact with diverse media types. Users can upload YouTube links, PDFs containing research papers, articles, or books, and audio files, with the system automatically extracting and cataloging pertinent information from each upload into a vector database. The platform generates comprehensive summaries for every uploaded media, offering condensed insights for efficient comprehension. Central to the experience is the integration of a RAG-LLM chatbot. This chatbot, equipped with access to the uploaded media's information, adeptly guides users to relevant sections within the media, facilitating swift access to answers and insights.
Skills
AI/ML
Frontend/Backend Development
More Programming
Technologies
General
Languages
Certifications
AI
- AI For Industry - Foundations in AI
- AI For Industry - Literacy in AI
- IBM Machine Learning Professional Certificate
- Google Data Analytics Professional Certificate
Other
- AWS Certified Solutions Architect – Associate
- Google IT Automation with Python
- Google Fundamentals of Digital Marketing
- TikTok Tech Immersion 2023
- TikTok Tech Immersion 2024
Let's Connect
I'm currently exploring new career opportunities and would love to connect with potential employers. Feel free to reach out through the contact form. I'm also happy to network with fellow developers, learners, and professionals in the tech community. Whether you have an exciting opportunity, seek collaboration on a project, or simply want to discuss the latest trends and technologies, I welcome the chance to connect and exchange ideas.