Pavel Vasilyev

LLM Engineer / AI Infrastructure Specialist
Jerusalem, Israel marchdown@gmail.com +972-54-343-1123 marchdown pavel-vasilyev-65105b149

Summary

LLM engineer with 14 years ML/NLP experience and 3 years specializing in production language model systems.
Expert in LLM workflows, safety guardrails, prompt engineering, and scaling generative AI infrastructure.
Strong functional programming background (Clojure, Elixir, Rust) with production Python, Java, and R experience.
MSc Computational Linguistics, BSc Nuclear Physics.

Experience

Co-Founder & Technical Lead founder
Fintilligence • Remote Aug 2024 – Dec 2024
  • Built full-stack fintech analytics platform for detecting asymmetric information flow in financial markets
  • Implemented PIN/VPIN (Probability of Informed Trading/Volume-Synchronized PIN) measures for market microstructure analysis
  • Designed and deployed data acquisition pipelines processing Level 2 market data (order book, bid-ask spreads, trade flow)
  • Managed PostgreSQL database architecture for high-frequency trade and quote data storage
  • Coordinated small team (3 people) across research, development, and UI design
  • Built investment advisory system highlighting potential information asymmetries and trading opportunities
  • Tech stack: Python, pandas, PostgreSQL, Flask, AWS (EC2, S3), market data APIs
CTO (Contract) contract
LevEhat NGO • Remote Mar 2024 – Jul 2024
  • Led technical operations for civic tech nonprofit focused on volunteer coordination
  • Managed migration from Google Cloud Platform to AWS infrastructure (cost optimization)
  • Coordinated UI designers and developers (team of 5) for volunteer management platform
  • Oversaw database architecture for volunteer tracking, task assignment, and activity logging
  • Established development priorities and technical roadmap for platform evolution
  • Tech stack: AWS (EC2, S3), Python, PostgreSQL, React (managed, didn't write)
AI Architect (Contract) contract
Stamina AI • Remote Jan 2023 – Dec 2023
  • Architected and deployed one of the early therapeutic chatbot systems for mental health support
  • Built complete LLM pipeline: prompt engineering, context management, response generation, safety guardrails
  • Integrated OpenAI GPT-3.5/4 APIs with custom safety layers and content filtering
  • Designed conversation state management and session handling for therapeutic context
  • Set up production infrastructure: API gateway, load balancing, monitoring, logging
  • Coordinated with consulting psychotherapists to ensure clinical appropriateness of responses
  • Managed small dev team (2-3 developers) implementing mobile and web interfaces
  • Implemented usage analytics and conversation quality monitoring dashboards
  • Tech stack: Python, OpenAI API, Flask, PostgreSQL, Redis (caching), AWS, Docker, Kubernetes
Independent Consultant consulting
Various Clients • Remote 2022 – 2023
  • ML and data consulting for various clients: model development, pipeline architecture, statistical analysis
  • Projects included: time-series forecasting, text classification, data pipeline optimization
Senior Researcher
Spring Research • Remote 2020 – 2021
  • Developed ML models for trading signal generation using time-series analysis and statistical methods
  • Built data pipelines processing Level 2 market data (tick-by-tick, order book, market depth)
  • Researched topology-inspired approaches to market microstructure modeling
  • Implemented backtesting infrastructure for strategy evaluation
  • Collaborated with quantitative research team on experimental high-frequency strategies
  • Tech stack: Python, pandas, numpy, scikit-learn, AWS, PostgreSQL
Data Scientist
Nestlogic • Remote 2019 – 2019
  • Built computer vision models for advertising creative optimization (image feature extraction)
  • Implemented A/B testing infrastructure using statistical hypothesis testing (t-tests, chi-square)
  • Deployed ML models to production on Google Cloud Platform with Kubernetes
  • Developed analytics dashboards tracking model performance and business KPIs
  • Tech stack: Python, PyTorch, OpenCV, GCP, Kubernetes, PostgreSQL
Data Scientist
Maverick Medical AI • Remote 2018 – 2018
  • Developed NLP system for medical named entity recognition in clinical text using spaCy and BiLSTM
  • Built medical ontology framework for standardizing terminology across different hospital systems
  • Created decision support tools for clinical workflows highlighting critical findings
  • Worked within HIPAA compliance requirements for healthcare data
  • Tech stack: Python, spaCy, PyTorch, Flask, PostgreSQL, R, Mathematica
Full-Stack Engineer
Athena Portfolio Solutions • Remote 2017 – 2017
  • Built full-stack financial NLP platform extracting signals from news and SEC filings
  • Developed Java backend services for data processing and entity recognition
  • Implemented entity linking system connecting market events to portfolio positions
  • Built sentiment analysis models for earnings calls and analyst reports
  • Created knowledge graph of financial entities (companies, people, events, relationships)
  • Tech stack: Java, Python, spaCy, NLTK, scikit-learn, Neo4j, R, Maxima
Independent Tutor & Consultant consulting
Private Practice • Remote / International Relocation 2016 – 2017
  • Technical tutoring and consulting during international relocation period (Moscow → Prague → US)
  • Mathematics, statistics, programming, and computational linguistics instruction
  • ML/NLP consulting for various clients
Technical Tutor (Intermittent) intermittent
Private Practice • Remote 2017 – Present
  • Ongoing mathematics, statistics, programming, and computational linguistics tutoring
  • Students: high school through graduate level, plus professional colleagues
  • Peak activity during 2020 pandemic period

Technical Skills

LLM Engineering
OpenAI GPT-3.5/4, Anthropic Claude, prompt engineering, RAG pipelines, context management, safety guardrails, content filtering, production deployment, latency optimization, monitoring
Programming
Python (10+ years production), Clojure (10+ years), Elixir (2+ years), Rust (2+ years), Java, R, SQL (advanced), bash; functional background (Common Lisp, Haskell, Go)
ML/AI
PyTorch, TensorFlow, Keras, spaCy, NLTK, Hugging Face Transformers, scikit-learn, XGBoost, LightGBM, CatBoost
Deep Learning
CNNs, RNNs, LSTMs, GRUs, Transformers, attention mechanisms, transfer learning, fine-tuning
Cloud
AWS (EC2, S3, SageMaker, Lambda), GCP (GCE, GCS, GKE), Docker, Kubernetes
Databases
PostgreSQL (expert), MongoDB, Redis, Neo4j, SQLite, MySQL
Data Engineering
Apache Spark, Apache Airflow, Kafka, ETL pipelines
Web & APIs
Flask, FastAPI, Django, REST APIs, microservices
DevOps
git, Linux, CI/CD (GitHub Actions, Jenkins), monitoring
Domain Expertise
  • LLM Production Systems: 3 years building therapeutic chatbots, safety systems, prompt engineering, RAG pipelines (Stamina AI)
  • Quantitative Finance: Level 2 market data, PIN/VPIN algorithms, order flow analysis, backtesting (Fintilligence, Spring Research)
  • Healthcare AI: Medical NER, clinical terminology, HIPAA compliance, decision support tools (Maverick Medical AI)
  • Financial NLP: SEC filings analysis, knowledge graphs, entity linking, sentiment analysis (Athena Portfolio Solutions)

Education

MSc Computational Linguistics 2016
Russian State University for the Humanities (RSUH) • Moscow, Russia
Statistical NLP, Machine Translation, Information Extraction
BSc Nuclear Physics 2011
Czech Technical University • Prague, Czech Republic
Mathematical Modeling, Statistical Analysis, Computational Physics

Languages

English (Fluent), Russian (Native), French (Conversational), German (Conversational), Czech (Conversational), Hebrew (Basic)