penghian@coldstorage — angpenghian.com — 80×24

~/penghian $ whoami

Mission-critical infrastructure that has to not fail.

Penghian Ang is a Site Reliability and DevOps engineer with 5+ years operating production infrastructure at scale — Tencent PUBG Mobile (6,000+ VMs, 40+ regions, up to 10M daily concurrents), Crypto.com blockchain validator and RPC nodes across 9 protocols, and institutional custody automation at Coinbase. Singapore.

// identity.json

statusok · on call · Singapore (UTC+8)
roleCold Storage Engineer III @ coinbase
focusincident response · SLO/SLI · capacity & cost planning · observability
chainston · cosmos · celestia · oasis · conflux · neo3 · zetachain · solana
certscka · terraform · aws-saa · aws-ccp · tencent×2 · az-900 · sophos
langen · 中文

$ cat work.log | tail -7

7 entries, ordered desc · latest: coinbase, 2025-01

  1. 2025 — Now

    Cold Storage Engineer III · Coinbase

    • Automated a multi-step balance-restoration alert pipeline (Slack alert → parse → Datadog data → heuristic compliance rules → Glean internal checks → Messari external/hack checks) replacing manual investigation; cut token spend by $500–$1,000 USD/month per pipeline via retrieval pre-filtering.
    • Built pre-call automation that pre-pulls institutional client context before video verification calls (Calendly → Datadog → Snowflake → internal tools → Salesforce → compliance docs → Slack) so operators have all transaction, identity, and approval data ready before joining.
    • Approved high-value institutional cold-storage transactions and conducted video verification under Coinbase Prime and Custody operational protocols.
    • Operated 24×7 global on-call rotation under strict compliance and security controls; partnered with Security, Product, and Engineering on operational improvements.
    • Python
    • n8n
    • Datadog
    • Snowflake
    • Salesforce
    • Glean
    • Messari
    • Slack API
    • AWS
  2. 2023 — 2025

    Senior Blockchain Security DevOps Engineer · Crypto.com

    • Owned all blockchain node infrastructure (full, archival, validator) across 9 protocols — TON, Conflux, Celestia, Oasis, Fetch.ai, NEO3, Neutron, ZetaChain, Vara — including upgrades and incident response.
    • Improved node reliability through stress testing across the full request path (Azure Front Door → load balancer Nginx/Kong → blockchain node), identifying bottlenecks before they reached production.
    • Built node deployment pipeline: Terraform-provisioned Azure VMs, Docker images via GitHub, deployed through Azure DevOps CI/CD with auto-pull of updated images.
    • Consolidated multiple Cosmos-family Ansible playbooks into one template, cutting new-chain rollout time and standardizing upgrades.
    • Wrote custom Prometheus exporters for chain-specific node health monitoring, visualized in Grafana.
    • Owned proof-of-concept for secure developer access (VS Code → Teleport → VM).
    • Terraform
    • Docker
    • Ansible
    • Azure DevOps
    • Nginx
    • Kong
    • Prometheus
    • Grafana
    • Teleport
  3. 2022 — 2023

    DevOps Engineer · Tencent

    • Sole DevOps point of contact for PUBG Mobile global production: 6,000+ VMs across 40+ regions and 6 cloud providers (Tencent Cloud, AWS, Azure, GCP, Huawei, Zenlayer) serving 4–10M daily concurrent players.
    • Led release coordination for two major PUBG Mobile game patches, orchestrating rollout across global server fleets with cross-team go-live timing and verification.
    • Partnered with business teams on capacity and cost planning (FinOps): forecast VM requirements ahead of holiday traffic peaks to scale fleets up for expansion and down afterward.
    • Built an automated DDoS response system (state machine: block → disable server → re-enable → cleanup, integrated with BlueKing Job Platform) cutting manual intervention during attacks.
    • Built FairPing QoS engine equalizing per-player latency within a match using numpy outlier removal and iptables/tc traffic shaping.
    • Reduced alerting from 600+ noisy rules to 140 symptom-based actionable alerts; introduced a new SLI for battle-join success rate after finding the team measured system latency rather than user-visible outcomes.
    • Built internal FastAPI services for IP-whitelist management and real-time latency/monitoring tooling on Redis and BKData; managed fleet via Terraform and BlueKing CI/CD.
    • Solution adopted as best-practice reference across teams by the Director of Tencent Overseas Games.
    • Python
    • FastAPI
    • Terraform
    • Redis
    • BlueKing
    • iptables/tc
    • numpy
    • Multi-cloud
  4. 2021 — 2022

    DevOps Engineer · UP DevLabs

    • Supported live production for games across AWS and Aliyun, including off-hours and midnight patch deployments via Jenkins CI/CD.
    • Built repeatable deployment pipelines; implemented ELK for developer self-service log search, reducing ops support load.
    • Established Grafana/Prometheus monitoring for system health and capacity planning.
    • AWS
    • Aliyun
    • Jenkins
    • ELK
    • Grafana
    • Prometheus
  5. 2021

    System Analyst · Kenrich Partners

    • Day-1 phishing incident response. Coordinated with legal counsel (Rajah & Tann) and performed root cause analysis.
    • Led MAS TRM compliance initiatives: asset tracking, system hardening, SIEM setup.
    • Migrated on-prem infrastructure to Azure.
    • MAS TRM
    • SIEM
    • Azure
    • Incident response
  6. 2020 — 2021

    SNOC Engineer · Government Technology Agency (GovTech)

    • Monitored and maintained uptime for critical government systems using AWS CloudWatch, Grafana, Splunk, and SolarWinds.
    • Supported real-time incident triage, root cause analysis, and cyber threat detection for national digital services.
    • CloudWatch
    • Splunk
    • SolarWinds
    • Grafana
  7. 2020

    NOC Engineer · Netpluz Asia

    • Provisioned and troubleshot circuits (Baccess, GPON, MetroE) and configured Cisco and Sophos network devices.
    • Built Python automation tools to streamline operations, reducing manual ITSM reporting by 80%.
    • Cisco
    • Sophos
    • Python
    • Networking

$ tree projects/ --max-depth=2

6 entries · contracts + open source · github.com/angpenghian

  1. 2025

    Site Reliability Engineer · Valigator (independent contract)

    • Built production-grade Solana staking infrastructure for a white-glove validator service.
    • Authored Ansible playbooks to automate validator provisioning, upgrades, and monitoring setup.
    • Developed a custom Solana Node Exporter integrated with Prometheus and Grafana dashboards.
    • Solana
    • Ansible
    • FastAPI
    • Prometheus
    • Grafana
  2. 2024 — 2025

    Senior Infrastructure Support Engineer · Thoughtworks (independent contract)

    • Led end-to-end build and maintenance of SNTC.org.sg, a public-good project supporting Singapore's special needs community.
    • Designed and deployed enterprise-scale cloud infrastructure using Infrastructure as Code, improving scalability and resilience.
    • Delivered government-requested website updates within 2 weeks, ensuring compliance and uninterrupted service.
    • Partnered with CTO/CIO/COO stakeholders to align technical delivery with organizational goals.
    • AWS
    • Terraform
    • IaC
    • Compliance

$ ls open-source/ --gh

  1. 2025

    Agent Times — x402 news API for AI agents

    Node.js / Express / Docker / SearXNG / SQLite on DigitalOcean. 28,000+ articles indexed across 1,100+ RSS feeds. First real USDC payment received on Base mainnet.

    • Node.js
    • Express
    • Docker
    • SearXNG
    • x402
    • Base
  2. Ansible

    solana-ansible-kit

    Production-grade Ansible automation provisioning Solana validator fleets with security hardening, performance tuning, and zero-downtime upgrades.

    • Ansible
    • Solana
    • Linux
  3. CI/CD

    solana-repro-builds

    Automated CI/CD pipeline (GitHub Actions) publishing reproducible Solana validator binaries with hermetic Docker builds and checksum verification.

    • GitHub Actions
    • Docker
    • Solana
  4. Prometheus

    solana-exporter

    Prometheus exporter (Python/FastAPI) for Solana validator observability — exposes 26+ metrics, ships with a 21-panel Grafana dashboard.

    • Python
    • FastAPI
    • Prometheus
    • Grafana

$ ls skills/

8 categories · sorted by relevance

  • reliability_sre/

    SLO/SLI definition · error budgets · incident command · blameless postmortems · on-call rotation · stress/load testing · capacity & cost planning (FinOps) · disaster recovery (RTO/RPO)

  • cloud_iac/

    AWS · Azure · GCP · Tencent Cloud · Aliyun · Huawei Cloud · Terraform · Ansible

  • containers/

    Docker · Kubernetes (CKA) · Helm · GitHub Actions · Azure DevOps · Jenkins · ArgoCD

  • observability/

    Prometheus · Grafana · Datadog · ELK · Splunk · PagerDuty · custom exporters

  • scripting/

    Python (FastAPI, Flask) · Bash · n8n

  • networking/

    Nginx · Kong · load balancing · Azure Front Door · iptables/tc · DNS · TCP/IP

  • blockchain/

    Solana · TON · Cosmos (Neutron, Fetch.ai, Vara) · Celestia · Oasis · Conflux · NEO3 · ZetaChain · validator and RPC node operations

  • security/

    SIEM · MAS TRM · segregation of duties · incident response · root cause analysis · post-mortems

$ openssl x509 -text -in certs/*

8 valid certificates

  • cka

    Certified Kubernetes Administrator

  • terraform

    HashiCorp Certified — Terraform Associate

  • aws_saa

    AWS Certified Solutions Architect — Associate

  • aws_ccp

    AWS Certified Cloud Practitioner

  • tencent_sa

    Tencent Cloud Solutions Architect — Associate

  • tencent_sys

    Tencent Cloud SysOps — Associate

  • az_900

    Microsoft Certified — Azure Fundamentals

  • sophos

    Sophos Certified Engineer

$ cat education.txt

3 entries

  • B.Sc.

    University of Wollongong — coursework toward B.Sc. Computer Science (Digital Systems Security)

  • Diploma

    Diploma in Infocomm and Digital Media (GPA 3.9) — Temasek Polytechnic

  • Higher Nitec

    Higher Nitec in Information Technology — ITE College West

$ ping penghian

available for SRE and infrastructure work · avg response 12h

penghian@gmail.com  ·  linkedin.com/in/angpenghian  ·  github.com/angpenghian  ·  resume.pdf