Razi Mahmood Home Page|Data Scientist| Deep Learning| Model Building

About Me

I am a graduate student at Rensselaer Polytechnic Institute studying multimodal machine learning for medical imaging and beyond in the BME Department. Prior to this, I was an undergraduate at UC Berkeley Data Science majoring in data science with a domain emphasis on cognition.

I am a team-oriented research-focused graduate student with data science internship experience to solve real-world problems ranging from IT Ops, Neurotech, e-Commerce to Healthcare. I have used statistical and deep-learning-based AI models in NLP, computer vision, and medical imaging. I am interested in basic exploratory data analysis, particularly to answer research questions. I am also interested in interfacing with domain experts to learn new topics, analyze their data and communicate the findings to the corresponding business stakeholders.

Throughout undergraduate and high school, I have done several data science and CS projects in various domains ranging from medical imaging, computer vision, NLP, and population health. For a concise summary, please see my CV.

Education/Courses Taken

Ph.D in Machine Learning, Dept. of Biomedical Engineering, Rensselaer Polytechnic Institute (RPI), started August 2023.

B.A. Data Science, University of California, Berkeley, May 2022

Studied Data Science with a domain emphasis in Cognition at UC Berkeley.

Here are the relevant courses I’ve taken in past semesters that are related to my experiences studying within the Data Science and Computer Science fields.

In addition, I have taken a few courses in machine learning and cloud computing on Coursera.

Course Number	Course Title	Term
CS 61A	Structure & Interpretation of Computer Programs	Spring 2019
CS 61B	Data Structures	Fall 2019
CS 61C	Great Ideas of Computer Architecture (Machine Structures)	Fall 2020
CS 70	Discrete Math & Probability	Spring 2020
CS 188	Introduction to Artificial Intelligence	Fall 2020
DATA 8	Foundations of Data Science	Fall 2018
DATA 100	Principles & Techniques of Data Science	Spring 2020
DATA 102	Data, Inference, Decision Modeling	Fall 2021
DATA 104	Human Contexts and Ethics of Data	Spring 2020
DATA 140	Probability for Data Science	Spring 2021
COGSCI 1	Introduction to Cognitive Science	Summer 2020
COGSCI C100	Basic Issues in Cognition	Spring 2021
COGSCI C131	Computational Models in Cognition	Fall 2021
ELENG 198	Introduction to Neurotechnology	Fall 2021
ENVECON 118	Introductory Applied Econometrics	Fall 2021
INFO 159	Natural Language Processing	Spring 2022

Skills

Programming Languages

Python (4+ years), Java, R, C, Matlab, SQL, HTML, Javascript

Tools & Databases

Jupyter Notebook, Eclipse, Visual Studio, Sublime, Deepnote, IntelliJ, MySQL, VIM, VI, Logism, ITKSnap, 3D Slicer, Adobe Premiere Pro, Unity, Maya

ML Platforms, Libraries, & Models

TensorFlow, Keras, PyTorch
Pandas, Scikit-learn, Scipy, Seaborn, Numpy, Nltk, Gensim, Matplotlib, Tidyverse, SimpleITK, OpenCV, Matplotlib, Pyplot.
Word2Vec, BERT, BioBERT, Clinical BERT, Resnet50, Resnet101, VGG16, U-net, LSTM, Auto-encoders, OpenAI models (GPT3/4, CLIP, DALL-E)

Cloud Platform Technologies

Docker, Flask, Linux, Git, Webservers (Tomcat), Big Data (S3, Hadoop), AWS, Familiarity with DevOps

Data Science Skills

Deep Learning, Machine Learning, Data Science, Data Analysis, Statistics, Data Visualization, Technical presentation skills
Data preparation, processing, cleansing, standardization, data analysis, ETL and visualization.

Research/Academic Experience

Fact-checking for Generative AI, RPI Pre-Graduate Research May 2023-Sept. 2023

Pre-graduate school independent research on image-driven fact-checking of AI-generated textual reports for chest X-ray imaging under the guidance of Prof. Pingkun Yan from RPI and radiologist Dr. Mannudeep Kalra from Harvard/MGH.
Developed a new fact-checking AI model> (AUC=0.87) by training on pairs of real and fake report sentences with imaging to correct AI-generated reports using CLIP encodings transformed to a higher-dimensional space for classification.
Result showed 15% quality improvement over AI reports. First work to fact-check AI-generated radiology reports. Contributing fake report dataset to open source.
Research paper published in Proc. MLMI and was an oral presentation at Machine Learning in Medical Imaging (MLMI) Workshop at MICCAI, 2023.

IBM Watson (Jan-May'21)

Gained NLP research experience in the IBM Watson NLP team led by Dr. Rama Akkiraju, CTO of Watson AI Ops and mentored by Xiatong Liu, Data Science Manager.
Developed a novel log anomaly classification algorithm combining BERT language modeling of IT logs with supervised contrastive learning.
The resulting algorithm achieved an overall accuracy of 97.32% on a dataset of 10000 HDFS system logs and outperformed other machine learning algorithms. Research paper under preparation.
Work involved using Spacy NLP library, BERT sentence transformed from Huggingface on PyTorch, and supervised contrastive learning model modeled after a NeurIPS2020 paper .

Academic Development Committee Mentor, Data Science Society(DSS), UC Berkeley (Aug'21-present)

Selected to teach Data Science basics and mentor Berkeley undergraduates on Data Science Capstone research projects.
Facilitated discussion on specific Data Science topics through mini-lectures and curated jupyter notebooks.
Mentored two groups of 5 students on Data Science Capstone projects.
Expanded practical knowledge of EDA, visualization, modeling, machine learning, hypothesis testing.

Lab Assistant/Academic Intern - CS/Data Science, UC Berkeley (Aug'2019-May'2021)

Facilitated students’ introductory Berkeley CS experience through hands-on instruction, tutoring in office hours and CS labs, mediating online course forum discussions (CS61A), providing problem walkthroughs for class projects and bug fixing on Python Jupyter Notebooks (Data 100).
Received recognition from students for assistance with debugging, quick explanations, recapping course topics (data visualization, modelling) in feedback forms. Was able to accommodate more than half of the incoming queue during consultation hours.

Medical Imaging/Computer Vision Machine Learning Intern - IBM Almaden Research Center (June'2014-July'2016, Jan. -June '21)

Worked again recently in the Medical Sieve Radiology Grand Challenge group at IBM Almaden Research Center on developing a new type of deep learning framework called spatially-preserving flattening, for location-sensitive recognition of findings in chest X-rays. Joint work with Neha Srivastava at Stanford. Resulted in a publication at ISBI'22 conference.
Earlier, under a program for middle and high school students to be mentored by researchers at IBM Almaden Research Center, I volunteered as a machine learning intern in this group.
Contributed to several medical imaging AI research projects for automatic detection of pulmonary embolism, cardiac aneurysms, and dilated cardiomyopathy in CT and echocardiography through development of new ideas, and implementing them along with IBM Researchers.
The research resulting from this early mentoring experience was presented in Synopsis Science Fair, 2014-2016, and published in international conferences (AMIA’14 PMID: 25954393, IEEE ISBI’15) at age 14, and patent disclosures were submitted.
Learned the use of many pre-deep learning statistical machine learning packages, tools such as ImageJ, and coding in Java and Matlab to process medical data in HL7 and DICOM during this experience.

Summer Internships

HyperFine Research (Aug-Sept'21)

Interned as part of a team of professional data scientists to create labeled data collections for deep learning-driven anomaly recognition in brain MRI generated from their portable MRI scanner.
Developed an automated labeling algorithm for brain MRI images from their companion textual reports using language models, NLP, and vocabulary-driven concept extraction. The algorithm extracted 7200 annotations for 600 brain MRIs achieving 88% precision and 70% recall in performance. Used BERT and Word2Vec models and Spacy NLP libraries.
Wrote python scripts to collate results of radiologist annotations with original MRI Dicom files using an edit distance-based name matching. Work involved using Pandas, Spacy NLP libraries.
Developed user interfaces to record ground truth anomaly labels indicated by clinicians in companion MRI reports that led to ten-fold decrease in annotation time.

Xoran Technologies (June-Aug'21)

Developed a 3D anatomical segmentation algorithm for cone beam CT studies. Reconstructed volumes for 9 anatomical structures in head and neck including eyes, maxillary sinus, sphenoid, etc. using U-net-based deep learning architecture trained on 17 CT volumes achieving a Dice coefficient of 0.68. Work used SimpleITK, numpy, Python, Keras, and Tensorflow libraries.
Surveyed several image annotation tools and prepared a report. Trained colleagues on use of ITKSnap and 3D Slicer for manual regional annotations.

SWAYD (Jan-March'20)

Worked as an computer vision content gathering intern in a team of 4 for the startup. Developed an algorithm in Python for automatically classifying foods/dishes using ImageNet-trained DL models and linking them to their respective restaurants via hashtags and geo-tags in instagram posts.
Obtained hands-on experience of data preparation, cleansing, processing, algorithms development, APIs/platforms (Postman, ClarifAI, Google Maps API).

Projects

Over the last 7 years, I have done several projects covering data science and general CS areas. The projects done as part of work experience are proprietary and details are provided in the attached presentations. For projects done in open source or freelance, GitHub links are provided where possible for code.

Data Science Projects

Summer Internship Projects
The summer internships at Xoran Tech and Hyperfine involved development of automated annotation tooling to enable deep learning model development. More details of this work are available in the following reports and under the Experience tab:

IBM Watson AI Ops – Anomaly Detection in IT Logs
While interning with IBM Watson NLP team, developed a new approach called ContrastBERT, for log anomaly classification using supervised contrastive learning on BERT-encoded log data.

The probem addressed was to recognize which of a set of IT logs were anomalous. The IT logs came in the form of free text intermixed with identifiers such as block ids but no definite signatures for anomalies.
Previous approaches tried to address this by extracting handcrafted features from event sequences and building classifiers on the features.
My approach was to observe that there is information in both the order and the content of the text sentences and modeled it using sentence BERT. I then built a supervised constrastive encoder that differentiates between the BERT encodings of normal and abnormal IT logs. A deep learning classifier was then built using the learned contrastive encoder.
The resulting classifier outperformed existing log anomaly detection methods on a benchmark dataset of 10K HDFS logs achieving an accuracy of 97.3%
This paper was accepted at IEEE Big Data Conference Workshop on Knowledge Discovery in Data Mining on IT Operations, Osaka, Japan, Dec. 2022 paper.
Download related presentation and Github code.

NeuroTech Elective Project - A Deep Learning-based Sleep Stage Analyzer
Developed a deep learning-based algorithm to analyze the sleep stages from EEG signals and study their variance across population.

Adapted a 1D CNN architecture (8 CNN layers, 1 drop-out layer) to implement a 5-channel EEG signal classifier into 5 sleep stages of (Awake, Stage1, Stage2, Stage3, REM).
Achieved a balanced accuracy of 0.76 and Cohen's Kappa score of 0.706 for the developed network on a dataset of EEG signals from the Sleep Physionet dataset (30 PolySomnoGraphic sleep recordings).
Download related presentation and Github code.

Deep Learning Projects on Kaggle
Worked under the mentorship of a data scientist, Humza Iqbal, Secruiti.ai on several Kaggle datasets to build deep learning models for several problems below. Gained experience on deep learning model building using Keras, Tensorflow, and PyTorch.

Digit recognition using CNNs on MNIST data
ResNet50 classifier on CFAR-10 dataset.
U-Net based TGS Salt deposit segmentation. Used this model example to later build 3d segmentation of conebeam CT.

Machine learning-driven Contraceptive Use Prediction
Worked in a three-member team to find optimal predictor variables for the use of contraceptives in a survey dataset gathered for Indonesian women for purposes of family planning rollout measures.

Experimented with logistic regression, decision trees, and random forest with PCA on features.
Implemented using Scikit-learn library. Dealt with data pre-processing, cleansing, and formatting.
Explored standardization of patient data formats such as FHIR for large scale patient record analysis.
Download report and access Github code.

Cal Hacks 6.0 Collegiate Hackathon Project : LateNight
Developed an app as part of a group project that used neighborhood crime data from local county to develop a safety index for the restaurants in neighborhoods in Berkeley.

Involved web scraping, crime record analysis, map visualization.
Programmed in Swift and Python.

CS Projects

Full-fledged CPU design

Developed a full-fledged CPU design for processing a full set of RISC-V instructions using Logism in CS61C Computer Architecture course.

Development of GitHub Clone

Developed a full-fledged GitHub clone in Java that implements functions of Github for repository management in CS61B Data Structures.

Simulation of the Enigma Machine

Built Java-based simulator for a generalized version of the Enigma machine used during WWII for encrypting messages & substitution ciphers.

Escape The Tune Game

When I was selected to participate in competitive California Summer School for Math & Science (COSMOS) at UC Santa Cruz, I designed a video game as part of video game cluster. This was a space game where the player navigates a spaceship syncing to musical rhythms. Implemented this in Unity and Maya. Code available from Github.

Film Projects

Publications

R. Mahmood, Ge Wang, Mannudeep Kalra, Pingkun Yan, “Fact-Checking of AI-Generated Reports,” in Proc. Machine Learning for Medical Imaging (MICCAI Workshop), Vancouver, BC, Canada October 2023.
R. Mahmood, X. Liu, A. Xu, R. Akkiraju, “ContrastBERT: Supervised Contrastive Learning of BERT-Encoded IT logs for Anomaly Classification,” in Proc. IEEE Big Data Conference Workshop on Knowledge Discovery in Data Mining on IT Operations. Osaka, Japan, Dec. 2022.
N. Shrivastava, R. Mahmood, T. Syeda-Mahmood, “Spatially-preserving flattening in deep learning for location-aware classification,” in Proc. International Symposium on Biomedical Imaging, Kolkata, India, March, 2022.
R. Mahmood, T. Syeda-Mahmood, “Automatic detection of left ventricular aneurysms in echocardiograms,” in Proc. International Symposium on Biomedical Imaging (ISBI), New York, April 2015. See local copy.
R. Mahmood, T. Syeda-Mahmood, ”Automatic detection of dilated cardiomyopathy in cardiac ultrasound videos,” in Proc. American Medical Informatics Association (AMIA) Annual Conference, Washington, D.C., November, 2014. See See local copy.

Contact

Elements

Text

This is bold and this is strong. This is italic and this is emphasized. This is ^superscript text and this is _subscript text. This is underlined and this is code: for (;;) { ... }. Finally, this is a link.

Heading Level 2

Heading Level 3

Heading Level 4

Heading Level 5

Heading Level 6

Blockquote

Fringilla nisl. Donec accumsan interdum nisi, quis tincidunt felis sagittis eget tempus euismod. Vestibulum ante ipsum primis in faucibus vestibulum. Blandit adipiscing eu felis iaculis volutpat ac adipiscing accumsan faucibus. Vestibulum ante ipsum primis in faucibus lorem ipsum dolor sit amet nullam adipiscing eu felis.

Preformatted

i = 0;

while (!deck.isInOrder()) {
    print 'Iteration ' + i;
    deck.shuffle();
    i++;
}

print 'It took ' + i + ' iterations to sort the deck.';

Lists

Unordered

Dolor pulvinar etiam.
Sagittis adipiscing.
Felis enim feugiat.

Alternate

Dolor pulvinar etiam.
Sagittis adipiscing.
Felis enim feugiat.

Ordered

Dolor pulvinar etiam.
Etiam vel felis viverra.
Felis enim feugiat.
Dolor pulvinar etiam.
Etiam vel felis lorem.
Felis enim et feugiat.

Icons

Actions

Table

Default

Name	Description	Price
Item One	Ante turpis integer aliquet porttitor.	29.99
Item Two	Vis ac commodo adipiscing arcu aliquet.	19.99
Item Three	Morbi faucibus arcu accumsan lorem.	29.99
Item Four	Vitae integer tempus condimentum.	19.99
Item Five	Ante turpis integer aliquet porttitor.	29.99
		100.00

Alternate

Name	Description	Price
Item One	Ante turpis integer aliquet porttitor.	29.99
Item Two	Vis ac commodo adipiscing arcu aliquet.	19.99
Item Three	Morbi faucibus arcu accumsan lorem.	29.99
Item Four	Vitae integer tempus condimentum.	19.99
Item Five	Ante turpis integer aliquet porttitor.	29.99
		100.00

Buttons

Icon
Icon

Disabled
Disabled

About Me

Education/Courses Taken

Skills

Research/Academic Experience

Summer Internships

Projects

Data Science Projects

CS Projects

Film Projects

Publications

Contact

Elements

Text

Heading Level 2

Heading Level 3

Heading Level 4

Heading Level 5

Heading Level 6

Blockquote

Preformatted

Lists

Unordered

Alternate

Ordered

Icons

Actions

Table

Default

Alternate

Buttons

Form