Portfolio

Here are some of the projects I'm proud of:

LeetCode

I decided to try to level up some of my coding skills by practicing on LeetCode and I surprised myself by getting "Beats 100%" for speed on an easy, but popular, question. So I tried something a bit harder and found a medium-level question where I did the same. But LeetCode scores in buckets and so maybe those scores weren't that impressive and it was only a medium-level problem (although I didn't see another solution posted that I liked better).

2718. Sum of matrix after queries - 152ms 78.31MB

That night, as I was trying to fall asleep, I realized that there were still some inefficiencies with my code. So the next day, I came back and cut that time in half (I guess, maybe? LeetCode runtimes are super random) but I also got in the 100% bucket for memory usage too:

My Solution: I realized that I didn't need to track every change to the matrix and that I was rewriting the same elements over and over. So if I ran the data backwards, I wouldn't need to overwrite any elements that already contained a value.

HealthVision (HDAI)

I worked on all aspects of this application which is available in hospitals throughout the country (inside Epic). This application allows physicians and nurses to gain valuable insights into their patients' medical history using data science and AI tools. This app queries tables with hundreds of million to billions of rows, so we built an advanced filter selector to allow users to find the data they're looking for easily. I created the tables to save the column and filter selections in and connected much of that functionality from the front end to the database and back. I also built much of the UI for the custom saved view selector.

Before that, I worked on linking three tables of nested data to display HCC gap information to help ACOs stay on top of their reporting. These pages had breadcrumbs and saved filter nad scroll state between navigation and complex table functionality using React Suite and Ant-D. Added patient scorecards including Google Maps integration, star ratings, and various data widgets.

PhysGPT (Private Investor)

This project ingested millions of vectors of data from over 5000 PDFs (some 100,000 pages long) that were OCRed, split into one paragraph chunks, embedded, and uploaded to Pinecone. When the user enters a query via the UI, the query is converted to a vector and sent to Pinecone (I also tested this with Milvus but Pinecone was more performant and easier than managing my own server). Pinecone then takes that vector, finds the top 10 closest (most semantically similar) vectors, and returns them. I query MongoDB to get the actual text and book metadata then I query the LLM with the text as well as an engineered prompt as context to get an AI-generated summary of the topic.

This was built with FastAPI and React, then Dockerized and pushed to ECR. The infrastructure was provisioned on AWS using Terraform. User management and account info is stored in DynamoDB, vectors are stored in Pinecone, and text and metadata in MongoDB.

Some of the more challenging bits about this project were:

Extracting meaningful text from PDFs

Tesseract was useful but setting the correct page segmentation mode was important for getting good results. The best page segmentation mode greatly depended on the text and there are things like image captions that would be better to keep with their associated images than pulled into the body mid-paragraph. I tried Scibeam parser and a few other solutions but none of them really worked for all my documents and I didn't want to pay for a solution like unstructured.oi. I'm currently working on a solution of my own for this sort of thing.

Splitting text into one paragraph chunks

This was a little bit more challenging than it appeared at first glance. I know there are tools like Langchain out there, but I didn't really think it offered me much and I wanted more control over the process. I didn't like the idea of overlapping my text chunks because it meant more tokens and it means you're getting multiple matches for the same piece of text so I looked into other options. Text tiling seems like a promising solution and it may be where I end up, but for now, I wrote my own code to split paragraphs based on where the OCR split things and it seems to work well enough.

Personal website - jonathanrys.com (Personal Project)

I built this site using ReMix. I chose ReMix for a few reasons: I was interested in proxy state management using MobX in React, I had read Kent C. Dodd's article on ReMix and wanted to try it out, and I wanted to learn DynamoDB. ReMix comes with a basic home page design out of the box. I liked it better than any previous idea I had and didn't want to re-invent the wheel, so I modified it and added some pages of my own. Design guidelines say to choose 3 colors for your brand and stick with those; I chose black, white and gray. I'm not really much of a designer, but I can quickly build complex, pixel-perfect UIs.

Queuing system (BetterLesson)

I took over ownership of the queuing system and the associated webhooks. I managed setting up and configuring a FIFO queue as well as dead-letter queues. This was built in AWS Lambda(Python) and SQS. The queuing system handled webhook calls from Zoom and Salesforce and updated our app and data warehouse whenever a data source changed. This project was challenging because all of the functions had to be idempotent for this to work properly.

Random search results (BetterLesson)

I was tasked with returning random articles from our collection of articles in ElasticSearch when searching without any search term entered. We wanted teachers to be able to save and share their search results and we also wanted the results to be relevant to the user so I used a gaussian decay function to match articles on the grades the teacher taught and created a gradient to group related subjects. Then I used a salt in the url as a random seed to allow users to save and share their randomized search results.

Coaching Platform (BetterLesson)

I worked on various features of our interactive coaching platform, from video integration with Zoom to user management consoles. This application was built with Flask and React and pulled data from Zoom, Salesforce, CoachBase(our data warehouse), and MySQL together to allow teachers and their coaches to lay out a learning plan, share milestones and feedback, schedule recurring meetings and event series, and share video clips from meetings all within our platform.

Python 3 Upgrade (BetterLesson)

I led the upgrade of our servers in all environments from Python 2.7 to Python 3.6 and updated the encoding and collation of character-based columns in MySQL from latin1 to utf8mb4. Upgraded libraries, updated syntax, fixed circular imports, and removed old code. Python 2.7 had become unusable for us as pip stopped supporting Python 2.7. The details of that endeavor are captured in this Stack Overflow post.

Mobile app to check product ingredients (Personal Project)

Designed and built a mobile app for Android using React Native. This app allows the user to set dietary preferences within the app and then scan food products at the store. The app takes the barcode and calls the OpenFoodFacts.org API to get a list of ingredients and other dietary information. Then it makes a call to my Python API to classify the ingredients. Then, based on the user's preferences, the UI would let them know if the product met their approval.

Structural Analysis Tool (American Tower)

I wrote C++ code for the structural analysis software used to test the integrity of tower designs and to identify failure points within the structures so that bracing could be added to support proposed loads. No other structural design software met American Tower's costs and needs at the time so they built and maintained their own.

Arduino project for my parents (Mom & Dad)

I wrote Arduino C++ code to control the lights on the stairs at my parent's house using motion sensors. This required using a priority queue so that someone triggering one set of lights doesn't affect another set of previously triggered lights.

OnSite (American Tower)

This was an app to help technicians take inventory of the equipment on cell phone towers. The app needed to be available offline so they could upload the data when they returned to their homes so I made heavy use of localStorage and service workers. This app was built with jQuery, CSS, and HTML. I was not initially the architect on this project, but I came to own the product. I used event delegation to keep the number of event handlers to a reasonable number and built the interface for taking inventory of the equipment on the towers as well as most of the other functionality for surveying the site and site access.

Digits (Pearson)

This was a digital learning platform that I helped work on. It was built with HTML, CSS, MathML, and Javascript. This was my first time working with responsive/adaptive design in the workplace.

Personal projects

I built a personal site using Adobe Dreamweaver because I was doubtful in my abilities to create a cross-browser compatible nested menu system on my own and I wanted to see how their system worked. I ended up adapting their code to my needs and taking control back from the design software, but it was a great starting point.

I built an H.P. Lovecraft-inspired Necronomicon text-based video game in C with some basic animated graphics on the splash screen. This was nothing special, but took a considerable about of skill to write in C.

Various projects

Along with the many personal projects and school assignments I worked on, I worked as a consultant for Steven J. Wessling Architects doing mostly networking and systems work. I also did various database, data science and web development work for various other companies including: Lifecycle, Cambridge Lights, and Intelligent Labor and Moving.

Texaco Project (Texaco)

I rewrote some legacy FORTRAN code that calculated the results of Cumene reactions to work with the inputs and outputs of the flowcharting software Aspen for my father. We used the results to program EEPROM chips for use in refinery control systems. This required rewriting all the loops to use a newer syntax as well as some other syntax changes and all I had was a dot-matrix printed FORTRAN manual to help me. After getting the code working in the flowcharting software, we were able to run simulations and adjust the inputs to find the optimal solution. We then took that solution and used it to program EEPROM chips for use in the refinery's control systems saving them more than $1M per year.