Responsibilities
1. Participated in the development and maintenance of market data ingestion and order matching systems with nanosecond-level latency requirements, ensuring stability and performance under extreme load conditions;
2. Engaged in performance modeling and analysis across various modules of the trading system (e.g., data path, memory access patterns, cache hit rate, context switch overhead), and drove system metrics visualization;
3. Participated in the development and tuning of a custom network protocol stack based on DPDK, leveraging kernel bypass technologies;
- Contributed to the design and optimization of low-latency communication frameworks, including:
- Network protocol stacks (TCP/UDP, WebSocket, HTTP/2);
- Inter-process/thread communication (shared memory, lock-free ring buffers);
- User-space timestamp precision and synchronization mechanisms;
- Monitored performance hotspots in the Linux kernel and studied their impact on low-latency trading (e.g., in the scheduler, network stack, and I/O subsystems), followed by targeted optimizations based on CPU microarchitecture characteristics.
⸻
Requirements
Fundamental & General Skills
- Proficient in modern C++ (C++17/20) with strong system-level high-performance programming skills (e.g., cache-friendly data structure design, zero-copy data pipelines, CPU affinity binding);
- Solid understanding of CPU architecture (x86_64), cache hierarchy, branch prediction, pipeline stalls, and other hardware-level behaviors;
- In-depth experience in at least one of the following areas:
- Analysis and optimization of core Linux kernel components such as the network stack, scheduler, and memory subsystem;
- CPU microarchitecture analysis using tools like perf, Intel VTune, or the Top-Down analysis method;
- SIMD vectorization (e.g., AVX2/AVX512) or custom assembly-level optimization;
- Development of system-level latency profiling or microbenchmark toolchains;
Networking & Protocol Stack Development
- Deep understanding of the WebSocket protocol with the ability to implement custom frame parsing and construction;
- Familiarity with I/O multiplexing mechanisms like epoll and io_uring;
- Experience independently building high-performance WebSocket or FIX/SBE protocol clients;
- Experience with at least one user-space network stack technology such as netmap or DPDK;
High-Frequency Trading & Data Processing
- Familiar with tick data processing workflows in high-frequency trading environments, including market data reconstruction and reordering mechanisms;
- Knowledge of PTP/TSC synchronization and system clock stability tuning;
- Candidates with knowledge of the network topologies and direct connectivity latency characteristics of major global exchanges will be preferred.
⸻
Bonus Points
- Hands-on experience in end-to-end trading system latency optimization (from drivers, interrupts, protocol framing to user-space scheduling);
- Proficiency with performance analysis tools such as perf, ftrace, bpftrace, VTune, and cachegrind;
- Experience designing and tuning in-house user-space lock-free message queues or shared memory pools.