As a DevOps Engineer within our IT group, you will collaborate within an agile environment with top-notch developer and operations teams to build cloud infrastructure running the business systems. You will work with creative individuals responsible for delivering best-of-breed web and mobile products to our customers. Your role will be critical in building, extending, maintaining, and observing our infrastructure and platform, relentlessly striving to improve team velocity, reliability, and scalability.
Essential Duties and Responsibilities:
- Build effective, stable and reliable infrastructure, tools, and services.
- Manage, maintain, and monitor Linux-based systems for our highly-available, public-facing applications and internal services. Support and monitor Windows-based database systems.
- Evaluate hardware and software, run benchmarks, and perform capacity planning, for existing and future deployments.
- Schedule and track maintenance, patching and upgrades to hardware and software.
- Create and continuously improve our CI/CD pipelines
- Monitor applications and ensure that required Service Level Agreements (SLAs) are met.
- Ensure cross-team collaboration by participating in daily team stand ups as well as facilitate incident resolution and proactive strengthening of site reliability.
- Contribute documentation and operational support for disaster recovery and business continuity planning.
- Participate in an on-call rotation ensuring 24/7 support for cloud production systems and networks.
- Be smart, learn quickly, fail fast
- Exposure to some scripting
- Familiarity with cloud automation tooling like Packer/Terraform
- Flexible hours, occasional on-call rotation
- Strong communication skills in both written and oral form
- Strong time management skills and detail-oriented work ethic
- Strong values of ownership, personal development, and transparency.
- Ability to thrive in a team environment and effectively work with diverse groups of coworkers.
- Experience automating and orchestrating distributed systems
- Can troubleshoot and contribute to improving efficiency within the application and database environments.
- A firm grasp of basic networking concepts including routing, subnets and firewalls
- Strong understanding of Internet technologies, including DNS, VPN, SSH/SSL, load balancing and security.
- Knowledge of and experience supporting disaster recovery and contingency planning to help minimize business-impacting risks
- Experience in virtualization platforms and underlying storage systems
- Experience in containerization and container orchestration