cv | Ahmad Faraz Khan

Basics

Name	Ahmad Faraz Khan
Label	Ph.D. Candidate in Computer Science
Email	ahmadfk@vt.edu
Url	https://afkd98.github.io
Summary	Ph.D. candidate specializing in Machine Learning Systems, with a robust focus on Federated Learning optimization. Demonstrates comprehensive expertise across programming languages and development tools, contributing significantly to both academic research and practical applications.

Education

2020.12 - Present
Ph.D.

Virginia Tech

Computer Science
- Machine Learning Systems
- Distributed Systems
- Deep Learning
- Machine Learning
- Cloud Development
- Computer Systems
2016.01 - 2020.01
B.S.

LUMS

Computer Science
- Distributed Systems
- Deep Learning
- Machine Learning
- Cloud Development
- Computer Systems

Work

2025.05 - Present
Research Intern

IBM Research, Almaden

Mentored by Dr. Taiga Nakamura and Dr. Swanand Ravindra Kadhe, focused on developing self-learning and self-maintainance loops for foundational models using targeted syntehtic data generation.
- Designed a self-optimizing loop and domain-specific fine-grained synthetic data generation techniques guided by embedding-based similarity metrics.
- Enabled controlled distributional coverage, continual learning, and autonomous maintenance of foundational models, significantly improving their robustness and performance in domain-specific tasks.
2020.12 - Present
Graduate Research Assistant

Virginia Tech, DSSL

Mentored by Dr. Ali Butt, focused on developing solutions for resource-constrained learning. My research spans the design of distributed systems, enhanced learning schedulers, and the fine-tuning of Large Language Models (LLMs), aiming at optimizing resource utilization, accuracy, and efficiency in privacy-aware learning environments.
- Built a distributed learning system in Pytorch for resource-constrained privacy-aware learning, enhancing resource utilization by 81x, scalability by 78x, and accuracy by 53%.
- Designed a distributed learning parameter server on Hadoop Spark to support over one million learning nodes, increasing scalability by 4x, reducing latency by 8x, and cutting costs by 2x.
- Developed a scheduler for distributed learning systems in Pytorch, which improved accuracy by 57% and reduced training time by 40%.
- Engineered an efficient, infinitely scalable, and cost-effective cache on AWS Lambda, ElastiCache, SageMaker, and EC2 for non-training workloads, decreasing latency by 99.9% and costs by 99.6%.
- Improved distributed ML schedulers in Pytorch to identify and eliminate adversarial data sources, increasing accuracy by 7% by successfully mitigating 100% of malicious data sources.
- Developed clustering-based personalized learning solutions in Pytorch for distributed ML systems, enhancing personalized accuracy by up to 45%.
- Designed a RAG-based context-aware LLM framework using Hugging Face and Pytorch, automating the adaptive online configuration of distributed cloud services to enhance resource efficiency.
- Implemented a Direct Preference Optimization (DPO)-based approach to mitigate sycophancy by fine-tuning LLMs on our curated dataset, reducing sycophancy by 64% in persona-based tests and 44% in preference-driven tests.
- Developed a DPO-based approach for prompt optimization without separate reward modeling for LLMs, improving score by 27% compared to supervised fine-tuning.

Publications

2025.7.31

FLStore: A Cache for Non-Training Workloads in Federated Learning

Published: MLSys'25

Designed 'FLStore' locality-aware processing cache for handling non-training workloads of distributed privacy-aware learning efficiently at low cost. Decreased latency up to 99.9% and cost up to 99.6%
2025.6.12

PETER: Privacy-Preserving Vertical Federated Learning Against Feature Inference Attacks

Submitted: TIFS'24

Proposed a lossless and efficient defense mechanism for inference attacks in Vertical Federated Learning environments.
2025.3.6

IP-FL: Incentive-driven Personalization in Federated Learning

Published: IPDPS'25

Designed an incentive-driven personalized training and fine-tuning distributed learning framework for resource-constrained data-heterogeneous Edge devices. Enhanced personalized accuracy by up to 45%.
2025.3.15

Exposing the Gaps in the Large Language Models’ Ability to Understand Code

Submitted: ICSE'26

Developed the first benchmark for evaluating the robustness of LLMs for debugging in the presence of non-functional mutations in code, providing guidelines to engineers on how to improve automated debugging performance with LLMs.
2025.2.15

LADs: Leveraging LLMs for AI-Driven DevOps

Submitted: ACL'25

Developed a first-of-its-kind reasoning-based Agentic AI-driven DevOps platform for adaptive online configuration of cloud systems, employing context-aware prompting for optimal resource efficiency and reduced human effort and cost.
2024.7.9

Prompt optimization for LLMs

Submitted: COLM'24

Developed a Direct Preference Optimization approach harnessing human preferences for prompt optimization of text-to-image tasks. Improved score by 27% compared to supervised fine-tuning
2024.4.22

FLOAT: Federataed Learning Optimizations with Automated Tuning

Published: ACM EuroSys'24

Designed 'FLOAT', a framework for enabling distributed learning and fine-tuning with high efficiency and resource utilization at low cost on constrained and heterogeneous Edge devices, leveraging Reinforcement Learning with Human Feedback. Improved resource utilization by 81x, scalability by 78x, and accuracy by 53%.
2024.12.15

DynamicFL: Federated Learning with Dynamic Communication Resource Allocation

Published: IEEE BigData'24

Designed 'DynamicFL' that dynamically allocates communication resources in distributed learning based on data heterogeneity, enhancing model accuracy by up to 10% compared to standard methods.
2024.12.15

Personalized Federated Learning Techniques: Empirical Analysis

Published: IEEE BigData'24

Conducted a thorough analysis of personalized ML algorithms, highlighting the trade-offs between privacy and performance.
2024.12.15

Mitigating Sycophancy in Large Language Models via Direct Preference Optimization

Published: IEEE BigData'24

Introduced a Direct Preference Optimization approach to mitigate sycophancy by fine-tuning LLMs on a curated dataset. Reduced sycophancy by 64% in persona-based tests and 44% in preference-driven tests.
2024.12.15

ICL: An Incentivized Collaborative Learning Framework

Published: IEEE BigData'24

Proposed a framework to incentivize collaboration in distributed learning environments.
2023.12.15

Towards cost-effective and resource-aware aggregation at Edge for Federated Learning

Published: IEEE BigData'23

Developed an adaptive aggregator that significantly enhances scalability and time efficiency for Federated Learning on Edge and IoT devices. Increased scalability by 4x, reducing latency by 8x, and cutting costs by 2x
2023.10.20

A survey on attacks and their countermeasures in deep learning: Applications in deep neural networks, federated, transfer, and deep reinforcement learning

Published: IEEE Access'24

Conducted a survey on adversarial tactics in deep learning models, emphasizing their applications and distinct features.
2022.7.9

Privacy Preserving and Feature Importance Based Incentive Mechanism in Vertical Federated Learning

Submitted: ICML'25

Monetized VFL with PERFACY-FL, an incentive mechanism valuing data quality and privacy using Homomorphic Encryption, boosting participation and profitability.
2022.7.10

Distributed Learning Schedulers

Published: FL-AAAI’22, IEEE CLOUD’22, AAAI-AIES’24

Developed a scheduler for distributed learning systems in Pytorch, which improved accuracy by 57% and reduced training time by 40%.
2022.12

Heterogeneity-Aware Adaptive Federated Learning Scheduling

Published: IEEE BigData'22

Proposes an adaptive scheduling method for federated learning that addresses resource and data heterogeneity, aiming to improve training efficiency and fairness.
2022.07

TIFF: Tokenized Incentive for Federated Learning

Published: IEEE BigData'22

Introduces a token-based incentive mechanism to encourage high-quality data contribution and sustained participation in federated learning systems.
2022.07

Tokenized Incentive for Federated Learning

Published: AAAI'22

Presents a tokenized incentive framework designed to motivate clients' participation and data sharing in federated learning environments.

Skills

	Programming Languages
	Python
	Javascript
	C/C++
	Java
	Go

	Tools & Libraries
	LangChain
	Ray
	Spark MLlib
	Hugging Face
	Ollama
	PyTorch
	TensorFlow
	PySpark
	AWS Suite
	Pandas
	Numba
	Dask
	Docker
	IBMfl lib
	FLOWER
	FedScale
	Hadoop
	Kubernetes
	OpenFaaS
	CUDA

Languages

	English
	Fluent

Interests

	Federated Learning
	Resource Optimization
	Model Performance
	System and Data Heterogeneity

	Machine Learning Systems
	Scalability
	Efficiency
	Cloud Development

Projects

2020.12 - Present
ML System Optimization

Developed algorithms to enhance ML system architectures for improved resource allocation, scalability, and efficiency.
- Designed and implemented DynamicFL to address heterogeneity in federated learning, published in IEEE BigData'24 (Best Paper).
- Created algorithms for personalized federated learning techniques, leading to empirical insights published in IEEE BigData'24.
- Optimized distributed ML systems for resource-constrained environments, contributing to impactful publications in EuroSys'24 and IEEE BigData'23.
2020.12 - Present
Federated Learning Frameworks

Led the design and development of both Horizontal and Vertical Federated Learning frameworks, integrating MLOps pipelines with AWS cloud resources.
- Developed HFL & VFL frameworks with tokenized incentives for participation, resulting in publications in AAAI and IEEE CLOUD.
- Integrated MLOps pipelines with AWS resources to ensure scalability and reliability.
- Contributed to the development of incentive mechanisms for collaborative learning frameworks, as published in IEEE BigData'24 and IPDPS'25.
2021.01 - Present
LLM Fine-Tuning and Optimization

Developed methods for fine-tuning large language models (LLMs) to reduce sycophancy, enhance privacy, and optimize prompts for specific tasks.
- Mitigated sycophancy in LLMs using Direct Preference Optimization, published in IEEE BigData'24.
- Fine-tuned LLMs with privacy-aware data for use in federated learning systems.
- Designed prompt optimization techniques tailored for text-to-image synthesis, currently under review at COLM'24.
2021.06 - Present
Privacy-Preserving Machine Learning

Developed privacy-preserving mechanisms for federated learning to counter feature inference attacks and enhance security.
- Designed PETER: a privacy-preserving framework for vertical federated learning, under review at TIFS'24.
- Proposed feature importance-based incentive mechanisms for federated learning, to be submitted to AAAI'25.
- Surveyed security threats in deep learning and proposed countermeasures, published in IEEE Access.

Basics

Education

Virginia Tech

Computer Science

LUMS

Computer Science

Work

IBM Research, Almaden

Mentored by Dr. Taiga Nakamura and Dr. Swanand Ravindra Kadhe, focused on developing self-learning and self-maintainance loops for foundational models using targeted syntehtic data generation.

Virginia Tech, DSSL

Publications

Published: MLSys'25

Designed 'FLStore' locality-aware processing cache for handling non-training workloads of distributed privacy-aware learning efficiently at low cost. Decreased latency up to 99.9% and cost up to 99.6%

Submitted: TIFS'24

Proposed a lossless and efficient defense mechanism for inference attacks in Vertical Federated Learning environments.

Published: IPDPS'25

Designed an incentive-driven personalized training and fine-tuning distributed learning framework for resource-constrained data-heterogeneous Edge devices. Enhanced personalized accuracy by up to 45%.

Submitted: ICSE'26

Developed the first benchmark for evaluating the robustness of LLMs for debugging in the presence of non-functional mutations in code, providing guidelines to engineers on how to improve automated debugging performance with LLMs.

Submitted: ACL'25

Developed a first-of-its-kind reasoning-based Agentic AI-driven DevOps platform for adaptive online configuration of cloud systems, employing context-aware prompting for optimal resource efficiency and reduced human effort and cost.

Submitted: COLM'24

Developed a Direct Preference Optimization approach harnessing human preferences for prompt optimization of text-to-image tasks. Improved score by 27% compared to supervised fine-tuning

Published: ACM EuroSys'24

Published: IEEE BigData'24

Designed 'DynamicFL' that dynamically allocates communication resources in distributed learning based on data heterogeneity, enhancing model accuracy by up to 10% compared to standard methods.

Published: IEEE BigData'24

Conducted a thorough analysis of personalized ML algorithms, highlighting the trade-offs between privacy and performance.

Published: IEEE BigData'24

Introduced a Direct Preference Optimization approach to mitigate sycophancy by fine-tuning LLMs on a curated dataset. Reduced sycophancy by 64% in persona-based tests and 44% in preference-driven tests.

Published: IEEE BigData'24

Proposed a framework to incentivize collaboration in distributed learning environments.

Published: IEEE BigData'23

Developed an adaptive aggregator that significantly enhances scalability and time efficiency for Federated Learning on Edge and IoT devices. Increased scalability by 4x, reducing latency by 8x, and cutting costs by 2x

Published: IEEE Access'24

Conducted a survey on adversarial tactics in deep learning models, emphasizing their applications and distinct features.

Submitted: ICML'25

Monetized VFL with PERFACY-FL, an incentive mechanism valuing data quality and privacy using Homomorphic Encryption, boosting participation and profitability.

Published: FL-AAAI’22, IEEE CLOUD’22, AAAI-AIES’24

Developed a scheduler for distributed learning systems in Pytorch, which improved accuracy by 57% and reduced training time by 40%.

Published: IEEE BigData'22

Proposes an adaptive scheduling method for federated learning that addresses resource and data heterogeneity, aiming to improve training efficiency and fairness.

Published: IEEE BigData'22

Introduces a token-based incentive mechanism to encourage high-quality data contribution and sustained participation in federated learning systems.

Published: AAAI'22

Presents a tokenized incentive framework designed to motivate clients' participation and data sharing in federated learning environments.

Skills

Languages

Interests

Projects

Developed algorithms to enhance ML system architectures for improved resource allocation, scalability, and efficiency.

Led the design and development of both Horizontal and Vertical Federated Learning frameworks, integrating MLOps pipelines with AWS cloud resources.

Developed methods for fine-tuning large language models (LLMs) to reduce sycophancy, enhance privacy, and optimize prompts for specific tasks.

Developed privacy-preserving mechanisms for federated learning to counter feature inference attacks and enhance security.