Hi, I'm Vaish

I'm an MS student @ Cornell, interested in machine learning.
As an undergrad, I majored in Computer Science + Math (also @ Cornell).
This website is a dumping ground for my projects and various other links.

what i do
Most recently, I interned atDatabricks, working on infra for Python notebooks. In the past, I've had the pleasure of interning at Meta AI, where I worked on making vision transformers for videos more compute efficient. Even before that, I interned at Instagram, working on backend infra + ML for suggested user recommendations.
As a researcher at CUAI, I enjoy working on interesting graph ML problems! I'm also a teaching assistant for Networks at Cornell. Resumes are too short, so read on below for more details :)
work experience
Databricks Software Engineering Intern

I developed an interactive Python debugger for Databricks Notebooks. This involved low-level work using libraries like debugpy and ipykernel, as well as web dev work to design the frontend and handle the state machine logic of a debugger.

Improved over comparable offerings like Jupyter Notebooks, by emphasizing features like concurrent user support and session reliability during long-running jobs. I also made open source contributions to ipykernel.

Got familiar with working with sharded deployments, and tools like Kubernetes and Bazel. I also got the opportunity to make design docs for key architectural decisions, and learnt to scope out large projects.

Python Scala Typescript May 2023 - August 2023
Meta AI Software Engineering Intern

I worked on making vision transformers for videos faster, while maintaining accuracy. Videos make up a large part of content consumed, so making ML models more efficient is very relevant to downstream tasks like video understanding in Instagram Reels.

I implemented adaptive token sampling modules from research papers in PyTorch, and incorporated them within transformer blocks. I trained and evaluated the models on internal datasets, and integrated the changes into production workflows.

Led to a 25% decrease in video transformer latency and flops, and an overall 10% decrease in the final production model.

PyTorch Lightning SQL June 2022 - August 2022
Instagram Software Engineering Intern

I worked on infra in the Instagram Suggested Users team, to increase CPU efficiency and recommendation quality.

My project involved optimizing the times at which the cache is refreshed by predicting a user's peak activity times, and scheduling the refresh accordingly. Led to a 30% reduction in CPU usage during the busiest hours, and an 8% decrease overall.

Used C++ and Python, as well as SQL and A/B testing frameworks for statistical analysis.

C++ Python SQL June 2021 - August 2021
Cornell Head Teaching Assistant

I've been responsible for holding recitations, setting up autograders, writing course materials, holding office hours, and leading grading sessions. I've also delivered some in-class lectures while substituting for the professor. I received the Alan Marx Memorial Prize for Excellence in Teaching ($500), given to 1 senior from the graduating CS class.

I've had the pleasure of TA-ing classes taught by some amazing professors! Here's a history:

  • CS 2802 (Honors Discrete Math): Fall '20
  • CS 4820 (Intro to Analysis of Algorithms): Spring '21, Summer '21, Fall '21, Fall '22
  • CS 4850 (Mathematical Foundations for the Information Age): Spring '22
  • CS 6850 (Theory of Information Networks): Spring '23, Fall '23

grading very long proofs fun slide animations August 2020 - present
Cornell DTI Backend Developer

I worked on CoursePlan, a website that helps Cornell students plan out their course requirements for graduation. This involved implementing infrastructure to check for different requirement fulfillment conditions through a bipartite matching algorithm. So far, we have over 3000 users!

Previously a dev on flux, an app to assess real-time traffic flow at Cornell campus locations. Implemented a model to predict queue wait-times using swipe data, which improves through weighted crowdsourced feedback.

TypeScript Firebase August 2020 - May 2022
Currently, I'm interested in graph ML methods and applications of algorithm design to machine learning. I'm also exploring CS theory research, advised by Professor Robert Kleinberg.
CU Artifical Intelligence

Working on deep learning graph ML methods that perform well on non-homophilous graphs. Many modern state-of-the-art methods rely on homophilous structures, meaning that they work best on graphs where you and your neighbour are likely to have the same label.

[1] Large Scale Learning on Non-Homophilous Graphs: New Benchmarks and Strong Simple Methods NeurIPS 2021
March 2021 - present
Online Learning Algorithms

Worked with Professor Robert Kleinberg on online randomized algorithms for approximating cumulative distribution functions (CDFs) under adversarial conditions, using only comparison feedback. A real-world example of this would be trying to estimate customers' willingness to pay for a good, by adjusting prices and observing their decisions.

[2] Non-Stochastic CDF Estimation Using Threshold Queries SODA 2023
September 2021 - present
Buckler Lab

I worked on BioKotlin, a statically typed library in Kotlin and C providing fast and optimized genetic analysis tools. Particularly, I worked on setting up the algorithms and infrastructure for computing sequence alignments and efficiently storing DNA sequences and alignments.

I also worked on using machine learning to predict maize haplotypes by optimally stitching together fragmented DNA alignments.

June 2020 - September 2020
PhyloML Github Demo

A phylogenetic tree library for parsing species DNA sequences and generating most-likely phylogenetic trees (demoed using a shiny React frontend). The tree generation algorithms range from simple distance methods to Bayesian inference Markov Chain Monte Carlo methods.

The library also provides heuristic multiple sequence alignment capabilities, ASCII-art representations of MSAs and phylogenetic trees, and functionality for parsing and writing to PhyloXML files.

OCaml React.js March 2020 - August 2020
Xi Compiler

As a team of 3, we wrote a compiler for the Xi language (based off C), involving lexing, parsing, type-checking, semantic analysis, and assembly code generation.

We also implemented optimizations like dynamic register allocation, common subexpression elimination, dead-code elimination, copy propogation, and constant propogation.

Deemed best compiler for CS 4120, based off of benchmarks testing both correctness and efficiency of the generated code. Most importantly, we got a cool plaque :-)

Kotlin Feb 2021 - June 2021
Crunch Github

A fast command line tool for lossless compression and decompression of files. The tool can be used on any filetype, however the levels of compression achieved vary. So far, I've implemented Huffman coding and the Lempel–Ziv–Markov chain algorithm.

Implementing Huffman coding involved optimizing the storage size of the generated Huffman table, and implementing classes to read and write data from files bit-by-bit using a buffer.

C++ May 2020 - July 2020
Preventing ROP Website

A responsive online learning platform consisting of modules regarding optimal neonatal care practices for preventing retinopathy of prematurity.

The modules consist of webinars, infographics and timed multiple choice tests users can take to get certified. So far, they been used by over 10,000 medical professionals across India.

The platform is used as a learning aid in training sessions at medical schools, and was part of a study assessing the effectiveness of hybrid online teaching methods.

MySQL PHP January 2018 - June 2019

A 10,000 line+ game consisting of programmable "critters" living in a world of hexagons. Designing the game involved:

- Writing a parser from scratch for the critter language.

- Implementing a GUI with fast panning and zooming designed to support very large worlds, and a mini-map.

- Writing a thread-safe distributed implementation to allow multiple users to log on to a server and play on the same world.

Java September 2019 - December 2020