cv
Basics
Name | Ahmad Faraz Khan |
Label | Ph.D. Candidate in Computer Science |
ahmadfk@vt.edu | |
Url | https://afkd98.github.io |
Summary | Ph.D. candidate specializing in Machine Learning Systems, with a robust focus on Federated Learning optimization. Demonstrates comprehensive expertise across programming languages and development tools, contributing significantly to both academic research and practical applications. |
Education
Work
- 2020.12 - Present
Graduate Research Assistant
Virginia Tech, DSSL
Mentored by Dr. Ali Butt, focused on developing solutions for resource-constrained learning. My research spans the design of distributed systems, enhanced learning schedulers, and the fine-tuning of Large Language Models (LLMs), aiming at optimizing resource utilization, accuracy, and efficiency in privacy-aware learning environments.
- Built a distributed learning system in Pytorch for resource-constrained privacy-aware learning, enhancing resource utilization by 81x, scalability by 78x, and accuracy by 53%.
- Designed a distributed learning parameter server on Hadoop Spark to support over one million learning nodes, increasing scalability by 4x, reducing latency by 8x, and cutting costs by 2x.
- Developed a scheduler for distributed learning systems in Pytorch, which improved accuracy by 57% and reduced training time by 40%.
- Engineered an efficient, infinitely scalable, and cost-effective cache on AWS Lambda, ElastiCache, SageMaker, and EC2 for non-training workloads, decreasing latency by 99.9% and costs by 99.6%.
- Improved distributed ML schedulers in Pytorch to identify and eliminate adversarial data sources, increasing accuracy by 7% by successfully mitigating 100% of malicious data sources.
- Developed clustering-based personalized learning solutions in Pytorch for distributed ML systems, enhancing personalized accuracy by up to 45%.
- Designed a RAG-based context-aware LLM framework using Hugging Face and Pytorch, automating the adaptive online configuration of distributed cloud services to enhance resource efficiency.
- Implemented a Direct Preference Optimization (DPO)-based approach to mitigate sycophancy by fine-tuning LLMs on our curated dataset, reducing sycophancy by 64% in persona-based tests and 44% in preference-driven tests.
- Developed a DPO-based approach for prompt optimization without separate reward modeling for LLMs, improving score by 27% compared to supervised fine-tuning.
- 2020.05 - 2020.12
Associate Data Engineer
i2c Inc.
Spearheaded the development and maintenance of distributed databases, focusing on performance optimization and scalability.
Publications
-
2025.7.31 DynamicFL: Federated Learning with Dynamic Communication Resource Allocation
Submitted: ACM EuroSys'25
Designed 'DynamicFL' that dynamically allocates communication resources in distributed learning based on data heterogeneity, enhancing model accuracy by up to 10% compared to standard methods.
-
2025.7.31 FLStore: A Cache for Non-Training Workloads in Federated Learning
Submitted: USENIX FAST'25
Designed 'FLStore' locality-aware processing cache for handling non-training workloads of distributed privacy-aware learning efficiently at low cost. Decreased latency up to 99.9% and cost up to 99.6%
-
2025.6.12 PETER: Privacy-Preserving Vertical Federated Learning Against Feature Inference Attacks
Submitted: TIFS'24
Proposed a lossless and efficient defense mechanism for inference attacks in Vertical Federated Learning environments.
-
2024.7.9 Prompt optimization for LLMs
Submitted: COLM'24
Developed a Direct Preference Optimization approach harnessing human preferences for prompt optimization of text-to-image tasks. Improved score by 27% compared to supervised fine-tuning
-
2024.7.9 Mitigating sycophancy in LLMs
Submitted: ICML Workshop'24
Introduced a Direct Preference Optimization approach to mitigate sycophancy by fine-tuning LLMs on a curated dataset. Reduced sycophancy by 64% in persona-based tests and 44% in preference-driven tests.
-
2024.7.9 Analyzing Personalized Machine Learning Algorithms
Submitted: VLDB'24
Conducted a thorough analysis of personalized ML algorithms, highlighting the trade-offs between privacy and performance.
-
2024.4.22 FLOAT: Federataed Learning Optimizations with Automated Tuning
Published: ACM EuroSys'24
Designed 'FLOAT', a framework for enabling distributed learning and fine-tuning with high efficiency and resource utilization at low cost on constrained and heterogeneous Edge devices, leveraging Reinforcement Learning with Human Feedback. Improved resource utilization by 81x, scalability by 78x, and accuracy by 53%.
-
2024.4.15 Enhancing Personalized Distributed Learning with Incentives
Submitted: NeurIPS'24
Designed an incentive-driven personalized training and fine-tuning distributed learning framework for resource-constrained data-heterogeneous Edge devices. Enhanced personalized accuracy by up to 45%.
-
2023.12.15 Adaptive Machine Learning Aggregator for Edge and IoT
Published: IEEE BigData'23
Developed an adaptive aggregator that significantly enhances scalability and time efficiency for Federated Learning on Edge and IoT devices. Increased scalability by 4x, reducing latency by 8x, and cutting costs by 2x
-
2023.10.20 Survey on Adversarial Tactics in DNN, DRL, FL, and TL Models
Published: IEEE Access'24
Conducted a survey on adversarial tactics in deep learning models, emphasizing their applications and distinct features.
-
2022.7.9 Privacy Preserving and Feature Importance Based Incentive Mechanism in Vertical Federated Learning
In Progress
Monetized VFL with PERFACY-FL, an incentive mechanism valuing data quality and privacy using Homomorphic Encryption, boosting participation and profitability.
-
2022.7.10 Distributed Learning Schedulers
Published: FL-AAAI’22, IEEE CLOUD’22, AAAI-AIES’24
Developed a scheduler for distributed learning systems in Pytorch, which improved accuracy by 57% and reduced training time by 40%.
-
2022.12.17 Heterogeneity-Aware Adaptive Machine Learning Scheduling System
Published: IEEE BigData’22
Designed a heterogeneity-aware scheduler for distributed learning systems in Pytorch to manage efficiency and accuracy. Improved accuracy by 57% and training time by 40%.
Skills
Programming Languages | |
Python | |
Javascript | |
C/C++ | |
Java | |
Go |
Tools & Libraries | |
PySpark | |
AWS Suite | |
Pandas | |
Numba | |
Dask | |
Docker | |
PyTorch | |
TensorFlow | |
IBMfl lib | |
FedScale | |
Selenium | |
Appium | |
gnuplot | |
ES6+ | |
TypeScript | |
React/Redux | |
Node.js | |
Express | |
MongoDB | |
SQL | |
FLSim | |
Spark MLlib | |
Hadoop | |
Kubernetes | |
OpenFaaS | |
CUDA |
Languages
English | |
Fluent |
Interests
Federated Learning | |
Resource Optimization | |
Model Performance | |
System and Data Heterogeneity |
Machine Learning Systems | |
Scalability | |
Efficiency | |
Cloud Development |
Projects
- 2020.12 - Present
Federated Learning Frameworks
Led the design and development of both Horizontal and Vertical Federated Learning frameworks, integrating MLOps pipelines with AWS cloud resources.
- HFL & VFL framework development
- MLOps pipeline integration with AWS
- Significant contributions to AAAI'24 and AAMAS'24
- 2020.12 - Present
ML System Optimization
Developed algorithms to enhance ML system architectures for improved resource allocation, scalability, and efficiency.
- Algorithm development for system optimization
- Impactful contributions leading to publications in IEEE and ACM conferences