Tracxn – Tech – Lead Site Reliability Engineer ( Lead SRE ) (4-10 Yrs)

5% Off Sitewide REFURB.io with Promo Code EXTRA5

Tracxn Technologies Pvt Ltd.
Bangalore

Job description

Tracxn is looking for experienced and motivated professionals to play a vital role in developing, scaling, and automating the IT infrastructure. As a Lead SRE, you will get hands-on experience in the latest technologies and skills like Ansible, AWS, Docker, Shell Script, Python, NodeJS, Kafka, Zookeeper, Mongo, MySql, Elastic, Redis, Spring, ELK Stack etc.

1 30-day Job and 1 30-day Career Ad Network ($425)

The incumbent in this role would demonstrate a strong focus on tactical operations, as well as large-scale production engineering and orchestration.

What we are looking for:

  • Experience in IaC tools (Puppet, Ansible, Chef, etc)
  • Experience in configuring and managing enterprise monitoring and resource tracking systems
  • Ability to automate operations
  • Expertise in at least one of the scripting languages
  • Experience in versioning tools like Git
  • Ability to use a wide variety of open source technologies and cloud services (AWS, Azure, GCP)
  • In-depth knowledge of System, Network and Application security principles and practices.

Bonus

  • Experience with containers and orchestration (Docker, Kubernetes)
  • Experience in Infrastructure and configuration automation (Terraform, SaltStack)
  • Understanding of protocols/technologies like HTTP, SSL, LDAP, SSH, SAML, etc.
  • Systems fluency (Linux, storage, networking)
  • Experience with modern software components (Mongo, Redis, ElasticSearch, Kafka)
  • In-depth knowledge of operating systems (processes, threads, concurrency issues, locks, mutexes, semaphores, monitors and how they work)
  • Experience in software-automation production systems (like Jenkins)
  • Expertise in software development methodologies

Key Deliverables

  • Build and lead a great team by example
  • Learn and develop leadership skills
  • Designing and developing our AWS Infrastructure
  • Developing & managing the infrastructure as code using Ansible
  • Implement automation tools and frameworks (CI/CD pipelines)
  • Optimize Tracxns computing architecture
  • Conduct systems tests for security, performance, and availability; monitor unit performance
  • Keep the customer-facing services available at top performance by using proactive monitoring tools and maintaining the constant health of the supporting systems.
  • Develop and maintain design and troubleshooting documentation
  • Drive RCA (Root Cause Analysis) for high priority incidents and work with respective development teams on preventive measures.
  • Automate detection and resolution of recurring issues in the production environment
  • Provide operational management information by collecting, analyzing, and summarizing operating and engineering data and trends.

What we have to offer

  • Work with a performance-oriented team driven by ownership and open to experiments.
  • Learn to design a system for high accuracy, efficiency, and scalability.
  • No strict deadlines; focus on delivering quality work.
  • Meritocracy driven, candid culture. No politics.
  • Very high visibility regarding which startups and markets are exciting globally

Please note – Should be remote working ready till pandemic subsides

Read More
All jobs – Hasjob

Scroll to Top