Sign In
 [New User? Sign Up]
Mobile Version

Principal Site Reliabilty Engineer

AT&T


Location:
El Segundo, CA
Date:
11/22/2017
2017-11-222017-12-21
Job Code:
att4-6067243
Categories:
  • Engineering
  •  
  • Save Ad
  • Email Friend
  • Print
  • Research Salary

Job Details

Company AT&T

Job Title Principal Site Reliabilty Engineer

Jobid att4-6067243

Location: El Segundo, CA, 90245, USA

Description AT&T is leading the way to the future – for customers, businesses and
the industry. We’re developing new technologies to make it easier to stay
connected to their world. With a network that covers 225 countries, including
more than 120 million customers, we’d say we’re well on our way. Together,
we’ve built a premier integrated communications company and an amazing place to
work and grow.



The AT&T Entertainment Group’s (AEG) Technology Team is at the
forefront of innovation, shaping the way our customers share and enjoy video
content. As part of our industry-leading team, you’ll leverage our extensive
networks to revolutionize the way people access their content at home or on the
go. From designing, implementing and deploying the software to supporting the
infrastructure that powers our video services, you’ll work on our DIRECTV OTT
and satellite TV platforms. We’ll look to you to improve the efficiency and
scalability of our systems, ensuring we continue to provide premier technology
to our customers.



DIRECTV is looking for a **PrincipalSite Reliability Engineer** to join our Operations team. The Video Operations
group is responsible for supporting on-air systems that power the DIRECTV
platform, including streaming on mobile devices, VOD and satellite TV; all systems
that are fast, fault-tolerant and scalable. **The Operations team isresponsible for resiliency, swift response, performance and security of DIRECTV’sproduction infrastructure.**



+ We provide support to the Software Engineering teams and drive bestpractices for DIRECTV/AT&T’s products nationwide.

+ We partner with the development teams tooptimize and operationalize their applications correctly

+ We ensure systems are properlymonitored, deployed and supported to provide the ultimate experience for our customers

+ We realize that failure is inevitable,so we embrace it and plan for fast recovery.



As a Principal, Site Reliability
Engineer, you’re curious, with deep technical knowledge. You’re a problem
solver and an engineer who uses ingenuity and technical leadership capabilities
to solve hard problems. You foster a culture of inquisitiveness, collaboration
and learning and are able to empathize with others. Your adaptation and
evolution are guided by your experiences.



We’re looking for an influential decision-maker who’s ready to
take on a high level of ownership and responsibility; a forecaster and problem
solver for all of Operations.



**Responsibilities**



+ Define and verify standards for configuration, monitoring,reliability and performance

+ Serve as subject matter expert for multiple proprietary and opensource technologies

+ Select and develop automation tools and scripts to improve theavailability, manageability, scalability and operability of services

+ Provide expert perspective regarding the capabilities and limits ofa combined cloud and multi-datacenter production infrastructure insoftware architecture designs

+ Solve performance and stability issues and prevent their recurrence

+ Define and evangelize cloud-related optimizations and bestpractices to improve reliability and performance



**Requirements**



+ Advanced administration knowledge of Unix/Linux systems.

+ Demonstrated experience writing code. (Java,C++, Groovy, etc.)

+ Solid understanding of cloud computingservices (AWS, Azure) with experience writing automation for cloudplatforms

+ Minimum two years of experience withscripting languages (Python, Shell, Perl, etc.).

+ Skilledin use of automation for job efficiency.

+ Minimum two years of experience working with microservices and container technologies (Docker, Kubernetes, etc.)

+ Ability to root cause sources of instability in ahigh-traffic, large-scale distributed system

+ Ability to learn rapidly andcommunicate value of new technologies to technical and non-technical audiences

+ Meticulous and careful. You identify andconsider all risks, and balance those with performing the taskefficiently.

+ Thrive in a highly collaborative environment including strongcommunication skills.



At AT&T, we’re bringing it all together. We deliver advanced mobile services, next-generation TV, high-speed internet and smart solutions for people and businesses. That’s why we’re investing to be the premier integrated communications company.

Powered ByLogo

Featured Employers

Featured Jobs