Principal Site Reliability Engineer 6 views

Walmart ecommerce sites handle thousands of visitors and millions of transactions per day and when something stops working, hundreds of thousands of dollars are lost. Imagine the activity, the transactions, and the data that flows through one of the largest ecommerce sites in the world. Do you have that in your mind? Now imagine what it might take to keep that site up, performing and running efficiently. If you have a clear picture of that in your head, you might be the person we need for our SRE team! SRE’s together will manage a large scale system made up of thousands of physical servers, request rates in the hundreds of thousands per second and data measured in petabytes. SRE team will be able to respond to production issues on a 24/7 basis and you will bring energy and relentless focus on continuous improvement within a fast paced environment. If you can still picture all of this in your head, we should talk! The job: Do what’s necessary to maintain our high standard of customer experience.

Your Opportunity

As a Site Reliability Engineer for marketplace team @Walmart, you’ll have the opportunity to

· Enjoy working on challenges that no one has solved yet

· Influence Engineering teams to design applications which are Cloud ready

· Be the first Line in handling any issues for one of the largest Private Cloud Infrastructure

· Define monitoring needs for ensuring Best Customer Experience

· Partner with other Engineering teams to have the right tool set to deliver Best Customer Experience on Walmart eCommerce Site

Our Ideal Candidate

· A technical strong and high performing individual with excellent communication skills with strong customer focus and appetite to learn and deliver.

Key Qualifications

· 13-16 years of experience supporting infrastructure in a high volume of customer-facing environment

· Capability to program in at least one language, ideally Python or Perl, but Ruby, C/C++, Java, or others are okay

· Experience with Unix/Linux systems with scripting experience in Shell, Perl or Python

· Strong knowledge of core protocols and tech such as: TCP/IP, HTTP, DNS, load balancers, distributed file systems, key-value and relational databases

· Extensive experience with configuration management tools such as Puppet, Chef, Salt, or Ansible

· Experience with specific software such as Hadoop, Kafka, Spark, CouchBase, and similar technologies is desirable, but the ability to quickly learn new technology is most important

· Capable of technical deep-dives into code, networking, systems, and storage with very bright, experienced engineers

· Expertise in problem solving and analyzing global scale distributed systems.

· Logging and Monitoring experience designing, deploying and running systems like Splunk, ELK, New Relic or other APM solutions

· Work with product delivery teams to identify architectural issues and ensure timely and smooth delivery of features into operations.

· Identify gaps in processes, skills, tooling, technology choices and work with upper management to drive improvements within the organization.

· Excellent written and verbal communication skills in order to influence architectural and process level change in the organization.

· BA/BS degree in Computer Science or related technical field, or equivalent practical experience.

Your Responsibility

· Build and Maintain Walmart’s next generation of infrastructure Platform

· Administration of production infrastructure

· Drive improvements in all aspects of service delivery, including change management, continuous delivery, security, monitoring and reliability Database administration in a mission-critical, 24/7 environment which include e-commerce, accounting, warehouse management and decision support systems

· Own end-to-end availability and performance of mission critical services and build automation to prevent problem recurrence; automate response to all non-exceptional service conditions

· Own the day-to-day health, uptime, monitoring, and reliability of services and server infrastructure

· Design, implement, and support high-performance, highly-available services and infrastructure

· Improve the efficiency and flexibility of our datacenters

· Build and maintain models for growth and capacity planning

· Deployment, support and monitoring of new platforms and application stacks

Participate in new technology evaluation, design and development of highly scalable distributed databases

Explore and evaluate new technologies and solutions to push our capabilities forward, getting ahead of our customers’ needs, getting people incentivized to transform, innovate and continually improv

Mandatory Skills


SRE, Development, Cloud



Bengaluru, Karnataka, India

Created on



More Information

Only candidates can apply for this job.
Share this job
Company Information
  • Total Jobs 26 Jobs
  • Location INDIA

Contact Us