Hi, I'm KyungIn Nam.
I build hardware that thinks faster.

Electrical Engineering student at UC Irvine with a focus on computer architecture, RTL design, and hardware acceleration. I work at the intersection of chip design and machine learning — turning research ideas into silicon-level solutions. My work has been published at IEEE/ACM DATE 2026.

Currently getting into ASIC back-end flow — synthesis, P&R, timing closure. Still a lot to learn, but that's the direction.

Irvine, CA BS EE @ UC Irvine Open to opportunities
Nov 2024 – Present
Undergraduate Researcher
Bias Lab · UC Irvine
  • Wrote Verilog IP blocks for Q8 fixed-point hardware acceleration and validated them through simulation and FPGA testing — a lot of debugging, but learned how throughput and power actually trade off in practice.
  • Built Python tooling to automate AWQ quantization sweeps across multiple design variants, which made benchmarking a lot less painful.
  • Contributed to a paper accepted at IEEE/ACM DATE 2026 — my main role was designing the benchmarking methodology and running the evaluation experiments.
Oct 2024 – Mar 2025
Hackers Intern
Hackers Fap · Irvine, CA
  • Rotated through the full semiconductor design pipeline, including front-end design, mask preparation, and back-end process steps, gaining hands-on experience with real fabrication workflows.
  • Focused on lithography stepper systems; assisted in calibrating exposure parameters, alignment routines, and pattern transfer processes to improve resolution and yield.
IEEE/ACM DATE 2026
H. Oh, K. Nam, R. Bhattacharjya, H. Chen, T. Das, S. Yun, S. Jang, A. Ding, N. Dutt, M. Imani
IEEE/ACM Design Automation and Test in Europe Conference (DATE 2026)

The problem: ternary LLMs are memory-efficient (8× smaller than FP16), but running them on CPUs is bottlenecked by lookup table (LUT) fetches — those LUTs occupy less than 0.01% of RAM yet account for 87.6% of all memory transactions and 91.6% of execution time. T-SAR fixes this by repurposing SIMD vector registers to generate LUTs on-the-fly instead of loading them from memory, turning a bandwidth-bound problem into a compute-bound one — with only minimal ISA extensions and no new ALUs.

Results: 5.6–24.5× GEMM latency reduction and 1.1–86.2× GEMV throughput improvement over state-of-the-art CPU baselines, across models from 125M to 100B parameters. Memory request volume drops by 8.7–13.8×. On a mobile CPU, a 7B model prefill goes from >20s to under 1.7s. Hardware cost is minimal: only +1.4% area and +3.2% power on a 256-bit SIMD slice (synthesized at TSMC 28nm). Energy efficiency is 2.5–4.9× better than NVIDIA Jetson AGX Orin on the same workloads.

My role: ran the gem5 simulations used to evaluate the ISA extensions, did the ALU-level ternary operator analysis, and built the performance–energy models that drove the final architecture decisions.

FPGA-based Accelerator Prototype

Synthesized RTL accelerator designs onto FPGA and worked through the timing and utilization issues that only show up once you're on real hardware. Learned a lot about the gap between simulation and actual deployment.

FPGA RTL Synthesis Timing Closure Hardware Acceleration

PEANO-ViT Softmax Approximation RTL

Implemented a Verilog module for hardware-efficient softmax approximation in Q8 fixed-point. Spent a fair amount of time getting the value scaling right — small errors in fixed-point arithmetic compound quickly.

Verilog Fixed-Point Arithmetic RTL Design

UART Transmitter / Receiver

Designed and verified UART TX/RX modules in SystemVerilog — baud-rate generation, control logic, and testbench simulation. A foundational project that made me much more careful about timing constraints.

SystemVerilog Digital Design Simulation

RTL / HDL

  • SystemVerilog
  • Verilog
  • VHDL

FPGA & Simulation

  • FPGA synthesis & impl.
  • Timing analysis
  • Functional simulation
  • gem5
  • Cadence
  • LTSpice

Programming

  • Python
  • C / C++
  • PyTorch

Domains

  • Hardware Acceleration
  • Computer Architecture
  • Quantization & LLM Inference
  • Semiconductor Fabrication

Exploring

  • ASIC back-end flow
  • Synthesis (DC / Yosys)
  • Place & Route

Interested in hardware, architecture, or research collaboration?
Let's talk.