I'm an AI Engineer currently pursuing an MS in Computer Engineering at NYU Tandon, building on a BS with Honors in Cognitive Science (ML & Neural Computation) from UC San Diego.
My work spans the full ML lifecycle — from designing RAG pipelines and fine-tuning LLMs for production at SageX Global and UniQreate, to conducting neuroscience research on neuroimaging and iron metabolism at UCSD. I'm especially drawn to problems at the intersection of efficient ML inference, quantized models, hardware-aware optimization, and autonomous systems.
When I'm not training models, you'll find me building Rust CLI tools, hacking on FPGA-based ZKP accelerators, or contributing to open-source projects like MediaSync (a media library pipeline) and stock portfolio optimization research.
This chat attempts to run the real onnx-community/Bonsai-1.7B-ONNX model with q1 1-bit weights directly in your browser through WebGPU. No server. No API calls. No data leaves your machine. If WebGPU or the model fails, the interface reports the exact failure and does not fabricate a response.