National Labor Exchange Veterans Jobs

nlx logo

Search Jobs from Veteran Friendly Employers

Job Information

Microsoft Corporation Principal Software Engineering Manager in Bellevue, Washington

The Citizens Application Platform (CAP) Engineering Quality team develops and delivers processes, frameworks, tools and services for the larger CAP organization in a effort to drive a Site-Up culture, reduce toil, and deliver an overall better experience for customers of the Dynamics 365 ecosystem.

To make this work for our customers, we need continual effort to make that delivery reliable. In our never-ending quest to drive reliability, we need YOU -- someone who is a passionate Site Reliability Engineering Manager (also known as SRE).

SREs are people who take engineering-based approaches to solve operations problems: we like infrastructure, we like seeing how big complicated things work, and most importantly, we gain great satisfaction from making it better. We have backgrounds in lots of things -- Computer Science, System Administration, Networking, Mathematics, and Engineering generally, but you can also find folks who've worked in Physics, Chemistry, Biology, Statistics, and even English.

SREs build, monitor, and maintain the sites, systems and infrastructure that ensure our customers can quickly access their data and run workloads whenever and wherever they need to. We identify service problems and areas for improvement, and we follow up by fixing those problems. Our work is key to the success of many of the Microsoft services you will have heard of, and a number you haven't. There are very few bits of Microsoft which aren't touched by SREs in some way or other.

With a focused lens of reliability, scalability and performance at the forefront, our SRE team is responsible for site up mentality and reliability across the product stack. To deliver the best for Microsoft’s customers we uphold daily Live Site principals, processes and procedures and drive DevOps integration between teams. All of this enables our shared vision of empowering every person and organization on the planet to achieve more.

The leader of this team will drive an embedded SRE model across the organization, with a focus on empowering each SRE and engineering team to achieve more, by focusing our efforts around a Site-up culture, reducing toil, and driving a self-healing service to return time to all engineers.

Join the CAP-EQ team today if you want to drive positive impact for the millions of D365 and Power Platform customers.


The scale of our operations is enormous. Microsoft's products and services are overwhelmingly consumed online, and billions of people use them every day. We need people who enjoy analyzing complicated problems, coming up with creative solutions, working in focused teams to build things no-one has thought of before, all in the service of production reliability.

If you are excited by this type of challenge, and you love to work in groups of people who are similarly excited, come join us. We value the input of people who aren't afraid to be learning all the time, who celebrate mistakes because they show the way forward, and those who are happy to continuously improve. We strongly believe that diverse experiences and backgrounds, and an environment where everyone can feel safe to contribute their own insights in a data-driven, objective, but the supportive way is the key to making the best workplace possible, and the best workplace makes the best products and services. Not only is it the smart thing, it's the right thing.

Primary Responsibilities:

  • Lead with a Livesite focus, everything else is secondary

  • Drive and uphold a Site-Up culture

  • Lead a Site Reliability Engineering (SRE) team as we transition into a modern engineering organization (agile, DevOps, VSO, CI/CD, Cloud computing etc)

  • Focus on service health visuals utilizing telemetry, for a dial tone, hands-off, self-healing environment

  • Be a key influencer in the evolution of the SRE discipline across CAP and actively participate in the engineering leadership team, maintaining engineering discipline with a DevOps culture for maximum efficiency

  • Heavy focus on driving efficiencies through automation and tools, if it is manually done today you are super passionate to automate it!

  • Collaborate continuously with Program Managers, Software Engineering and our business users in order to ensure the reliability, availability and performance of our services including internal services and external integrations

  • Operate and deliver within a SaaS -oriented environment

  • Apply full understanding of the business, the customer, and the solutions that a business offers to effectively design, develop, and implement operational capabilities, tools and processes that enable highly available, scalable & reliable customer experiences.

  • Build business continuity and disaster recovery plans, ensure they are tested regularly

  • Develop high quality monitoring and health reporting solutions that address our live site needs and as much as possible reflect the actual user or customer experience.

  • Consistently find ways to improve organizational health, push a growth mindset, and improve diversity and inclusion within the organization.

  • Some travel required: 1 to 2 trips to Hyderabad annually

We would like to talk to you if you:

  • Are interested in distributed systems and working with highscale services.

  • Like to work in a fast-moving environment and you aren't afraid to change things to make them better.

  • Enjoy new technological challenges and solving hard problems.

  • Believe that a team working well together is truly smarter than the single smartest person on that team.

  • Aspire to grow as a person, as a teammate, and as an engineer.


We try not to have too many formal qualifications, since mindset and demonstrated ability are more important, but previous successful candidates have often had some or all of the following:

  • Background in Computer Science equivalent to a B.Sc.

  • 5+ years of software development: automation-related experiencevalued in particular. Scripting languages such as bash, python, and PowerShell, or compiled languages such as C, C# are most relevant but others are acceptable.

  • Awareness of, and ability to reason about, modern software & systems architectures, including load-balancing, queueing, caching, distributed systems failure modes generally, microservices, and so on.

  • Associated troubleshooting skills, including the ability to follow RPC call-chains across arbitrary network steps. Consequent understanding of monitoring in distributed systems.

  • Deep understanding of operating system level concepts such as processes, memory allocation, and the network stack; understanding of how applications are affected by the above, and ability to debug same.

  • Experience with working in a team, including coordinating large projects, communicating well, and exercising initiative when presented with problems.

  • Practical experience running large scale online systems is always an advantage.

  • Strong customer focus with ability to work effectively across multiple business and technical teams to ensure continued customer happiness.

  • Demonstrated ability to lead and influence across matrixed organizations

  • Ability to manage several concurrent priorities and team workloads while delivering in an agile fashion.

  • Strong people skills to work cross-functionally with engineers, software developers, and leadership.

  • Demonstrated ability to seek out and leverage data driven insights that foster continuous improvement.

  • A Livesite focus and a proven ability to deliver in a DevOps environment

  • Technically solid, creative and with the ability to operationalize solutions at scale, with software

  • Demonstrated comfort in a frequently changing and ambiguous landscape as we move quickly to meet our user and customer expectations.

  • Demonstrated continuous learner, always looking to expand your knowledge and experiences that enable you to consistently drive high quality impact to Microsoft and for our customers.

  • Constant curiosity, expand your understanding of your services and how those light up success for our users and customers. Leverage what you learn to continuously improve yourself, your services and those around you!

Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include, but are not limited to the following specialized security screenings: Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter.

Microsoft is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to age, ancestry, color, family or medical care leave, gender identity or expression, genetic information, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran status, race, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable laws, regulations and ordinances. We also consider qualified applicants regardless of criminal histories, consistent with legal requirements. If you need assistance and/or a reasonable accommodation due to a disability during the application or the recruiting process, please send a request via the Accommodation request form at .

Benefits/perks listed below may vary depending on the nature of your employment with Microsoft and the country where you work.