~/penghian $ whoami
Mission-critical infrastructure that has to not fail.
Penghian Ang is a Site Reliability and DevOps engineer with 5+ years operating production infrastructure at scale — Tencent PUBG Mobile (6,000+ VMs, 40+ regions, up to 10M daily concurrents), Crypto.com blockchain validator and RPC nodes across 9 protocols, and institutional custody automation at Coinbase. Singapore.
// identity.json
$ cat work.log | tail -7
→ 7 entries, ordered desc · latest: coinbase, 2025-01
-
2025 — Now
Cold Storage Engineer III · Coinbase
- Automated a multi-step balance-restoration alert pipeline (Slack alert → parse → Datadog data → heuristic compliance rules → Glean internal checks → Messari external/hack checks) replacing manual investigation; cut token spend by $500–$1,000 USD/month per pipeline via retrieval pre-filtering.
- Built pre-call automation that pre-pulls institutional client context before video verification calls (Calendly → Datadog → Snowflake → internal tools → Salesforce → compliance docs → Slack) so operators have all transaction, identity, and approval data ready before joining.
- Approved high-value institutional cold-storage transactions and conducted video verification under Coinbase Prime and Custody operational protocols.
- Operated 24×7 global on-call rotation under strict compliance and security controls; partnered with Security, Product, and Engineering on operational improvements.
- Python
- n8n
- Datadog
- Snowflake
- Salesforce
- Glean
- Messari
- Slack API
- AWS
-
2023 — 2025
Senior Blockchain Security DevOps Engineer · Crypto.com
- Owned all blockchain node infrastructure (full, archival, validator) across 9 protocols — TON, Conflux, Celestia, Oasis, Fetch.ai, NEO3, Neutron, ZetaChain, Vara — including upgrades and incident response.
- Improved node reliability through stress testing across the full request path (Azure Front Door → load balancer Nginx/Kong → blockchain node), identifying bottlenecks before they reached production.
- Built node deployment pipeline: Terraform-provisioned Azure VMs, Docker images via GitHub, deployed through Azure DevOps CI/CD with auto-pull of updated images.
- Consolidated multiple Cosmos-family Ansible playbooks into one template, cutting new-chain rollout time and standardizing upgrades.
- Wrote custom Prometheus exporters for chain-specific node health monitoring, visualized in Grafana.
- Owned proof-of-concept for secure developer access (VS Code → Teleport → VM).
- Terraform
- Docker
- Ansible
- Azure DevOps
- Nginx
- Kong
- Prometheus
- Grafana
- Teleport
-
2022 — 2023
DevOps Engineer · Tencent
- Sole DevOps point of contact for PUBG Mobile global production: 6,000+ VMs across 40+ regions and 6 cloud providers (Tencent Cloud, AWS, Azure, GCP, Huawei, Zenlayer) serving 4–10M daily concurrent players.
- Led release coordination for two major PUBG Mobile game patches, orchestrating rollout across global server fleets with cross-team go-live timing and verification.
- Partnered with business teams on capacity and cost planning (FinOps): forecast VM requirements ahead of holiday traffic peaks to scale fleets up for expansion and down afterward.
- Built an automated DDoS response system (state machine: block → disable server → re-enable → cleanup, integrated with BlueKing Job Platform) cutting manual intervention during attacks.
- Built FairPing QoS engine equalizing per-player latency within a match using numpy outlier removal and iptables/tc traffic shaping.
- Reduced alerting from 600+ noisy rules to 140 symptom-based actionable alerts; introduced a new SLI for battle-join success rate after finding the team measured system latency rather than user-visible outcomes.
- Built internal FastAPI services for IP-whitelist management and real-time latency/monitoring tooling on Redis and BKData; managed fleet via Terraform and BlueKing CI/CD.
- Solution adopted as best-practice reference across teams by the Director of Tencent Overseas Games.
- Python
- FastAPI
- Terraform
- Redis
- BlueKing
- iptables/tc
- numpy
- Multi-cloud
-
2021 — 2022
DevOps Engineer · UP DevLabs
- Supported live production for games across AWS and Aliyun, including off-hours and midnight patch deployments via Jenkins CI/CD.
- Built repeatable deployment pipelines; implemented ELK for developer self-service log search, reducing ops support load.
- Established Grafana/Prometheus monitoring for system health and capacity planning.
- AWS
- Aliyun
- Jenkins
- ELK
- Grafana
- Prometheus
-
2021
System Analyst · Kenrich Partners
- Day-1 phishing incident response. Coordinated with legal counsel (Rajah & Tann) and performed root cause analysis.
- Led MAS TRM compliance initiatives: asset tracking, system hardening, SIEM setup.
- Migrated on-prem infrastructure to Azure.
- MAS TRM
- SIEM
- Azure
- Incident response
-
2020 — 2021
SNOC Engineer · Government Technology Agency (GovTech)
- Monitored and maintained uptime for critical government systems using AWS CloudWatch, Grafana, Splunk, and SolarWinds.
- Supported real-time incident triage, root cause analysis, and cyber threat detection for national digital services.
- CloudWatch
- Splunk
- SolarWinds
- Grafana
-
2020
NOC Engineer · Netpluz Asia
- Provisioned and troubleshot circuits (Baccess, GPON, MetroE) and configured Cisco and Sophos network devices.
- Built Python automation tools to streamline operations, reducing manual ITSM reporting by 80%.
- Cisco
- Sophos
- Python
- Networking
$ tree projects/ --max-depth=2
→ 6 entries · contracts + open source · github.com/angpenghian
-
2025
Site Reliability Engineer · Valigator
- Built production-grade Solana staking infrastructure for a white-glove validator service.
- Authored Ansible playbooks to automate validator provisioning, upgrades, and monitoring setup.
- Developed a custom Solana Node Exporter integrated with Prometheus and Grafana dashboards.
- Solana
- Ansible
- FastAPI
- Prometheus
- Grafana
-
2024 — 2025
Senior Infrastructure Support Engineer · Thoughtworks
- Led end-to-end build and maintenance of SNTC.org.sg, a public-good project supporting Singapore's special needs community.
- Designed and deployed enterprise-scale cloud infrastructure using Infrastructure as Code, improving scalability and resilience.
- Delivered government-requested website updates within 2 weeks, ensuring compliance and uninterrupted service.
- Partnered with CTO/CIO/COO stakeholders to align technical delivery with organizational goals.
- AWS
- Terraform
- IaC
- Compliance
$ ls open-source/ --gh
-
2025
Agent Times
Node.js / Express / Docker / SearXNG / SQLite on DigitalOcean. 28,000+ articles indexed across 1,100+ RSS feeds. First real USDC payment received on Base mainnet.
- Node.js
- Express
- Docker
- SearXNG
- x402
- Base
-
Ansible
solana-ansible-kit
Production-grade Ansible automation provisioning Solana validator fleets with security hardening, performance tuning, and zero-downtime upgrades.
- Ansible
- Solana
- Linux
-
CI/CD
solana-repro-builds
Automated CI/CD pipeline (GitHub Actions) publishing reproducible Solana validator binaries with hermetic Docker builds and checksum verification.
- GitHub Actions
- Docker
- Solana
-
Prometheus
solana-exporter
Prometheus exporter (Python/FastAPI) for Solana validator observability — exposes 26+ metrics, ships with a 21-panel Grafana dashboard.
- Python
- FastAPI
- Prometheus
- Grafana
$ ls skills/
→ 8 categories · sorted by relevance
- reliability_sre/
SLO/SLI definition · error budgets · incident command · blameless postmortems · on-call rotation · stress/load testing · capacity & cost planning (FinOps) · disaster recovery (RTO/RPO)
- cloud_iac/
AWS · Azure · GCP · Tencent Cloud · Aliyun · Huawei Cloud · Terraform · Ansible
- containers/
Docker · Kubernetes (CKA) · Helm · GitHub Actions · Azure DevOps · Jenkins · ArgoCD
- observability/
Prometheus · Grafana · Datadog · ELK · Splunk · PagerDuty · custom exporters
- scripting/
Python (FastAPI, Flask) · Bash · n8n
- networking/
Nginx · Kong · load balancing · Azure Front Door · iptables/tc · DNS · TCP/IP
- blockchain/
Solana · TON · Cosmos (Neutron, Fetch.ai, Vara) · Celestia · Oasis · Conflux · NEO3 · ZetaChain · validator and RPC node operations
- security/
SIEM · MAS TRM · segregation of duties · incident response · root cause analysis · post-mortems
$ openssl x509 -text -in certs/*
→ 8 valid certificates
- cka
Certified Kubernetes Administrator
- terraform
HashiCorp Certified — Terraform Associate
- aws_saa
AWS Certified Solutions Architect — Associate
- aws_ccp
AWS Certified Cloud Practitioner
- tencent_sa
Tencent Cloud Solutions Architect — Associate
- tencent_sys
Tencent Cloud SysOps — Associate
- az_900
Microsoft Certified — Azure Fundamentals
- sophos
Sophos Certified Engineer
$ cat education.txt
→ 3 entries
- B.Sc.
University of Wollongong — coursework toward B.Sc. Computer Science (Digital Systems Security)
- Diploma
Diploma in Infocomm and Digital Media (GPA 3.9) — Temasek Polytechnic
- Higher Nitec
Higher Nitec in Information Technology — ITE College West
$ ping penghian
→ available for SRE and infrastructure work · avg response 12h
penghian@gmail.com · linkedin.com/in/angpenghian · github.com/angpenghian · resume.pdf