What You"ll Do
Design and implement solutions that enhance application reliability, performance, scalability, and resilience.
Build and maintain monitoring, alerting, observability, and telemetry to drive proactive detection and incident analysis.
Lead incident management efforts, perform root cause analysis, and implement action based improvements.
Implement operational workflows using scripting, IaaC, and configuration management tools.
Manage capacity, performance, and scaling solutions to forecast demand and optimize infrastructure.
Collaborate with engineering teams to embed operability, resilience, and security into application architectures.
Build and automate reliable deployments through CI/CD pipelines, release governance, and version control systems.
Maintain clear runbooks, architecture diagrams, and operational documentation that enable efficient production support.
Experience Required
Managing Kubernetes and containerized workloads (EKS, AKS, GKE), including scaling, networking, upgrades, and monitoring.
Experience with cloud platforms (AWS, Azure, or Google Cloud Platform) across compute, storage, networking, IAM, and cost governance.
Using observability and APM tools such as Dynatrace, Splunk, Prometheus, Grafana, Datadog, Elastic/ELK.
Strengthening security and compliance controls in regulated environments (e.g., PCI DSS, SOC 2), including secure management of workloads.
Infrastructure automation experience using Terraform, CloudFormation, Ansible, or similar tools.
Designing and maintaining CI/CD pipelines using Jenkins, GitLab CI, GitHub Actions, or Azure DevOps.
Scripting and automation using Bash, PowerShell, or Python.
Experience in environments of electricity, engineering, or military related background (preferred).
Good to Have
Certifications such as AWS SysAdmin, AWS DevOps Engineer, Google Cloud DevOps Engineer, or CKA.
Experience with legacy applications, IBM iSeries, and/or library systems.
Hands on database operations and performance tuning (Oracle, SQL Server, PostgreSQL).
Prior experience as a major incident commander, stakeholder communicator, or ops lead/coordinator.
Experience with ITIL and ServiceNow (change, incident, and configuration management).
...Acadian Companies JOB DESCRIPTION Job Title: Dispatcher Alternate Job Title(s): Emergency Medical Dispatcher; Paramedic... ...daily activity. Provides pre-arrival instructions using EMD. Trains new Communication Center employees as requested. These...
...as a Great Place to WorkFortune Best Workplaces in Financial Services & InsuranceLicensed Auto Claims Adjuster**PRIMARY PURPOSE** **:** To analyze mid- and higher-level general auto claims to determine scope of damages; to ensure ongoing adjudication of claims...
...right place! Reports To: Senior Director, Video Production Direct Reports: None... ...Pelicans Associate Program is a ten-month intern-level program that exposes entry-level candidates... ...main focus will be on the creation of in-game digital video content . We're looking...
...Job Responsibilities: ~ Pharmacy Technicians in Participant Services accept incoming calls related to requests submitted to Participant Services, resolve issues that have been submitted to Participant Services via mail, tasks, and partners with other departments to...
Dual Hotel General Manager Location Raleigh, NC (Northwest Raleigh area) : SREE Hotels is seeking an experienced General Manager at our Courtyard and Residence Inn by Marriott , a dual hotel located in the Brier Creek area of Raleigh. As a General Manager you...