cv

Basics

Name Ahmad Faraz Khan
Label Ph.D. Candidate in Computer Science
Email ahmadfk@vt.edu
Url https://afkd98.github.io
Summary Ph.D. candidate specializing in Machine Learning Systems, with a robust focus on Federated Learning optimization. Demonstrates comprehensive expertise across programming languages and development tools, contributing significantly to both academic research and practical applications.

Education

  • 2020.12 - Present
    Ph.D.
    Virginia Tech
    Computer Science
    • Machine Learning Systems
    • Distributed Systems
    • Deep Learning
    • Machine Learning
    • Cloud Development
    • Computer Systems
  • 2016.01 - 2020.01
    B.S.
    LUMS
    Computer Science
    • Distributed Systems
    • Deep Learning
    • Machine Learning
    • Cloud Development
    • Computer Systems

Work

  • 2020.12 - Present
    Graduate Research Assistant
    Virginia Tech, DSSL
    Mentored by Dr. Ali Butt, focused on developing solutions for resource-constrained learning. My research spans the design of distributed systems, enhanced learning schedulers, and the fine-tuning of Large Language Models (LLMs), aiming at optimizing resource utilization, accuracy, and efficiency in privacy-aware learning environments.
    • Built a distributed learning system in Pytorch for resource-constrained privacy-aware learning, enhancing resource utilization by 81x, scalability by 78x, and accuracy by 53%.
    • Designed a distributed learning parameter server on Hadoop Spark to support over one million learning nodes, increasing scalability by 4x, reducing latency by 8x, and cutting costs by 2x.
    • Developed a scheduler for distributed learning systems in Pytorch, which improved accuracy by 57% and reduced training time by 40%.
    • Engineered an efficient, infinitely scalable, and cost-effective cache on AWS Lambda, ElastiCache, SageMaker, and EC2 for non-training workloads, decreasing latency by 99.9% and costs by 99.6%.
    • Improved distributed ML schedulers in Pytorch to identify and eliminate adversarial data sources, increasing accuracy by 7% by successfully mitigating 100% of malicious data sources.
    • Developed clustering-based personalized learning solutions in Pytorch for distributed ML systems, enhancing personalized accuracy by up to 45%.
    • Designed a RAG-based context-aware LLM framework using Hugging Face and Pytorch, automating the adaptive online configuration of distributed cloud services to enhance resource efficiency.
    • Implemented a Direct Preference Optimization (DPO)-based approach to mitigate sycophancy by fine-tuning LLMs on our curated dataset, reducing sycophancy by 64% in persona-based tests and 44% in preference-driven tests.
    • Developed a DPO-based approach for prompt optimization without separate reward modeling for LLMs, improving score by 27% compared to supervised fine-tuning.
  • 2020.05 - 2020.12
    Associate Data Engineer
    i2c Inc.
    Spearheaded the development and maintenance of distributed databases, focusing on performance optimization and scalability.

Publications

Skills

Programming Languages
Python
Javascript
C/C++
Java
Go
Tools & Libraries
LangChain
Ray
Spark MLlib
Hugging Face
Ollama
PyTorch
TensorFlow
PySpark
AWS Suite
Pandas
Numba
Dask
Docker
IBMfl lib
FLOWER
FedScale
Hadoop
Kubernetes
OpenFaaS
CUDA

Languages

English
Fluent

Interests

Federated Learning
Resource Optimization
Model Performance
System and Data Heterogeneity
Machine Learning Systems
Scalability
Efficiency
Cloud Development

Projects

  • 2020.12 - Present
    ML System Optimization
    Developed algorithms to enhance ML system architectures for improved resource allocation, scalability, and efficiency.
    • Designed and implemented DynamicFL to address heterogeneity in federated learning, published in IEEE BigData'24 (Best Paper).
    • Created algorithms for personalized federated learning techniques, leading to empirical insights published in IEEE BigData'24.
    • Optimized distributed ML systems for resource-constrained environments, contributing to impactful publications in EuroSys'24 and IEEE BigData'23.
  • 2020.12 - Present
    Federated Learning Frameworks
    Led the design and development of both Horizontal and Vertical Federated Learning frameworks, integrating MLOps pipelines with AWS cloud resources.
    • Developed HFL & VFL frameworks with tokenized incentives for participation, resulting in publications in AAAI and IEEE CLOUD.
    • Integrated MLOps pipelines with AWS resources to ensure scalability and reliability.
    • Contributed to the development of incentive mechanisms for collaborative learning frameworks, as published in IEEE BigData'24 and IPDPS'25.
  • 2021.01 - Present
    LLM Fine-Tuning and Optimization
    Developed methods for fine-tuning large language models (LLMs) to reduce sycophancy, enhance privacy, and optimize prompts for specific tasks.
    • Mitigated sycophancy in LLMs using Direct Preference Optimization, published in IEEE BigData'24.
    • Fine-tuned LLMs with privacy-aware data for use in federated learning systems.
    • Designed prompt optimization techniques tailored for text-to-image synthesis, currently under review at COLM'24.
  • 2021.06 - Present
    Privacy-Preserving Machine Learning
    Developed privacy-preserving mechanisms for federated learning to counter feature inference attacks and enhance security.
    • Designed PETER: a privacy-preserving framework for vertical federated learning, under review at TIFS'24.
    • Proposed feature importance-based incentive mechanisms for federated learning, to be submitted to AAAI'25.
    • Surveyed security threats in deep learning and proposed countermeasures, published in IEEE Access.