Code What Matters: Develop Product, Not Ops

Being highly-skilled problem solvers, developers can quickly be inundated with tasks well beyond the scope of building product.

Being highly-skilled problem solvers, developers can quickly be inundated with tasks well beyond the scope of building product. It can be a slippery slope as the manager might feel she is just asking for a couple small one off tasks, but then it expands. This will, in turn, start to devour developer productivity.

Developers should code what matters, and focus on building product.

Let’s examine that slippery slope and offer some tips on how to handle it so that developers can remain productive.

Developers should code what matters, and focus on building product. Click To Tweet

Without an Operations Team

To start, let’s focus on teams that don’t yet have a ops support team in place yet. For those who have worked in this environment, you know that this is a far from the ideal setup. Instead of spending time coding, developers spend their time tinkering with server configurations, squeezing more performance out of a database, staying on top of security patches for their stack, and more. All of that at the expense of doing what they were hired to do: ship code.

While having a small team is useful for some companies depending on their stage, it quickly becomes clear that someone on the team needs to take on an operations-oriented role. Either this is a developer who starts splitting their development time with operations tasks, a new hire joins the team, or the company outsources the function.

Regardless of how they choose to solve the problem of not having an ops support team, software companies very typically end up with two groups of people: one which is responsible for developing the software, the other which is responsible for managing the operations for that software.

Many companies can end up stuck having developers handling operations because they are getting by.

Not sure what your operations team should be focusing on? We can help with that. Subscribe to get our list of responsibilities.

With an Operations Team

If you are in a position of having a dedicated ops team, it is ideal that they approach operations with a developer-mindset.  Often this is where confusion about DevOps can come in.  DevOps doesn’t mean you have your product developers doing operations tasks, but rather applying the same process and rigor to managing operations as developers to in writing software.

Operations should collaborate with development and should be continually improving.

Collaborate with Development

Operations should provide guidance to development on constraints and developers should push back on those constraints when justified.  This give-and-take approach allows both teams to focus on delivering and operating the best product experience possible.

This collaboration isn’t a one off event but should exist through the entire process from conception through maintenance. Because of this strong collaboration, it is important that your operations team can understand and appreciate how software is written.  Sometimes digging into a production issue requires diving into the code.

It is important for your operations team to not only understand operations best practices but also have a background in software development. This enables them to cross the line into the code to assist development teams in debugging production issues, isolating performance problems, etc.

Being able to collaborate with development teams in such a deep and meaningful way will supercharge the productivity of building a great product.

Continually Improve

It is important for all groups in an organization to operate with a sense of continuous improvement or Kaizen. An operations team can do this in large part by borrowing best practices from development teams.

Using development processes like Scrum to managing backlog of work to be done can help triaging and prioritization.  Using source control for scripts and other artifacts help to track change.  Performing peer reviews on those changes enables team growth and shared context. Following DRY (do not repeat yourself) principles provide repeatability and efficiency.

Doing these things, where an operations team applies software development principles to operations, is where we get the term DevOps. It is both important to have an operations team apart from your developers, but also that that operations team is “doing DevOps”.

Still unsure of how your operations team should interact with your developers? Our list of key operations responsibilities can help sort that out. Subscribe to get the list.

Code What Matters

When you are just starting out, you are most likely not worried about operations at all.  As you get ready to launch, it’s typical for developers to just figure out how to get things running. Often, it’s easy to forget about the continued operations especially if your original developers have moved onto other projects or companies.

Hiring a dedicated operations team can be expensive and is just not practical until you reach a certain size.

This is where Eldarion Cloud fits the bill.

We can provide that dedicated operations team that collaborates with your development team in deep and meaningful ways because, as well as our experience with operations, we are a team of seasoned developers that understand how to ship product.

If you are wanting to unburden your development staff from operations tasks we would love to chat with you to see how we could help make that happen so they can code what matters.

7 Tips for Making Sure You’re Monitoring Your Services Correctly

Monitoring services isn’t rocket science, but that doesn’t mean you can just turn on a dashboard and hope for the best. These tips will get you started.

Monitoring services isn’t rocket science, but that doesn’t mean you can just turn on a simple dashboard and hope for the best. To make sure you don’t make some beginner mistakes, check out these six tips for better service monitoring.

To make sure you don’t make some beginner mistakes, check out these six tips for better service… Click To Tweet
Get the bonus content: The Server Monitoring Tools You Need to Know About

Tip #1: Backups Are Your Friend

Monitoring is one thing, but before you start doing anything, make sure you have a reliable way of doing backups. These should follow the 3-2-1 Rule of Backups (three copies of your data, on two different media, with at least one at a different location) when at all possible.

What does this have to do with monitoring?

It’s simple. When there are issues, you’re going to want to jump in there and change things, but sometimes you’ll just need to roll the whole system back to get a better idea of what’s going on with your code. That means your ability to monitor your services effectively is dependent on your ability to go back in time to an early backup point and restore your services from that point. It won’t happen all the time, but when it does happen you want to make sure you don’t suffer from any data loss.

This ties into a key part of great monitoring: understanding what the end result of that monitoring is going to be. What’s that? Fixing unexpected issues.

Tip #2: Make Sure You Can Check Per-User Usage

When you Google “software monitoring” you’re going to be faced with one thousand opinions on a wider range of topics. Most of these won’t be useful.

Just to set some definitions, let’s realize that monitoring your services could be as high-level as seeing what the electricity levels you’re using are at, or as low-level as reviewing every ‘0’ and ‘1’ that gets processed. Both could be useful in certain contexts, but unless you’re either building your own servers or running your own Google-scale data centers, you won’t need that information.

The ideal level of monitoring is to be able to connect performance changes to specific users.

That means you can see an uptick in load time on a chart and quickly be able to find out which user (or groups of users) are the driving force behind that change.

Users are unpredictable, so you should be able to quickly find out the specific account that is causing a performance bottleneck and either cut off or throttle their access.

Looking for tools to help you monitor your services? Check out our curated list here.

Tip #3: API Specific Performance

Much like tracking users, you should be able to find out what API calls are causing issues within seconds. No, this isn’t a memory trick where you memorize every single API call and what it’s signature is.

This means being able to go from the issue that called your attention to performance issues, down to the API call that is being made, to which service is calling that API. It forms a sort of tree that you can trace all the way to the root of a problem. (Of course, this also brings up the fact that you need good documentation of all of your API services. But, that’s another story for another day.)

Without being able to do this you’re going to be left searching for an answer without a path to that answer, right in the moment when fast action is required. It’s a terrible situation to be in, especially when the issues are hurting the bottom line of your business.

Tip #4: Record, Record, Record

Monitoring performance is only useful if you can detect anomalies. You won’t know that something is wrong unless you have a rough idea of what is “right.”

But, to know that, you need to have something to compare it to. That comparison should be to the record of how your services normally operate.

Some companies just do the bare minimum of recording their logs, but in reality there’s no excuse for that. Storage space is effectively free (okay, it’s about two cents per gigabyte on AWS), but that amount is so small that recording your logs for future comparison is well worth the cost.

This is important because when you do detect an issue, it’s going to be useful to be able to go back in time and see when this issue may have appeared in the past. You can only do that if you’re diligently recording all sessions and activity.

Tip #5: Use Prometheus

Many companies rely on having scripts written that test their services, believing that these pre-written scripts are crucial to effective monitoring. They believe that these scripts take the burden off of their shoulders and allows them to focus more on the issues that matter.

We have a different perspective.

We use Prometheus, which easily plugs into any development environment. It is built with the idea of scraping up all available metrics exposed at an endpoint and then storing it for future evaluation as needed. Compared to pre-written scripts, it takes much of the pain out of reliably testing your services.

Tip #6: Triaging Using PagerDuty

PagerDuty is one of the most valuable tools you can invest in. If you ask any engineering manager from a world class technology company, they’ll quickly nod their head in agreement.

Without this turning into an advertisement for a specific tool (there are alternatives, though they are less widely used), PagerDuty is almost required to bring your monitoring services up to par with your competition.
It will notify you when incidents occur, give you pertinent information so you can resolve the issue, and allow you to focus on solving issues rather than figuring out what the issues are. That saves you time and money

Tip #7: Know What You’re Tracking

At the end of the day, none of these tips are going to be useful if you don’t identify which key metrics are worth tracking. Depending on your service and how your users use it these could be memory, CPU, and disk usage, or network bandwidth, or any number of other variables.

On a more granular level, you should also make sure you know what application-specific metrics need to be tracked. These will be different for every application, but if you closely examine how your users are using your service you’ll quickly be able to see where potential bottlenecks are.

Bonus: Communicating Service Outages with Users

You need a service uptime dashboard. It’s a requirement for any business that people will notice in the event of an outage. This isn’t strictly related to how you monitor your services, but it should be considered part of the same meta-problem: building infrastructure that is ready for whatever is thrown at it.

There are plenty of options for this providing this dashboard and these answers, from building it in-house, to outsourcing to a third-party tool, but whatever you choose make sure you remember: Your users will want answers, so anticipate how you’re going to provide those answers.

What does it take to deploy and run scalable, high-performance web apps?

We love web applications at Eldarion. Whether we’re working on our own projects (for wine connoisseursfitness buffs, or folks who might want to microblog with a few more than 140 characters), or partnering with our customers to launch new applications and innovate in their markets, there is something satisfying about creating something that helps solve problems, reduce friction, or deliver a delightful user experience.

Modern web application development is teeming with programming languages, frameworks, tools, and APIs to help us iterate quickly and ship applications that provide rich experiences in the browser or via mobile apps.

But what about running these applications? Having the ability to quickly demonstrate a new feature before we roll it out to all of our customers? Ensuring that the application is highly-available, and continues to perform well as its audience grows?

Running a modern web application that serves a high volume of visitors requires:

  • A reliable and scalable infrastructure
  • An extensible system architecture
  • Expertise in maintaining, monitoring and securing the underlying system architecture
  • A deep understanding of how the deployed application(s) work
  • Eldarion Cloud provides a customized, end-to-end solution to meet all of these criteria.

Eldarion Cloud leverages Google Cloud Platform and the latest advances in containerization software to create an extensible, scalable PaaS (Platform-as-a-Service) that we pair with our expertise in providing DevOps and application development services.

Why we’ve chosen to build Eldarion Cloud on IaaS (Infrastructure-as-a-Service)

Google Cloud Platform (GCP), Amazon Web Services (AWS) and Microsoft Azure are the top three IaaS offerings. Each provider has data centers across the world and offers redundancy within each region. They handle the physical maintenance of each data center and ensure that the servers running in those data centers have reliable network connectivity and power. The providers also manage security at the data center level.

While we’ve chosen to deploy Eldarion Cloud on GCP, the system can be easily adapted to the other services as well. Private cloud, hybrid, and on-premises deployments are also possible.

Why Containerization and Kubernetes

Companies such as Google and Facebook are able to rapidly develop and continuously release new capabilities and bug fixes by aligning development and IT with modern DevOps principles, and by building containerized applications and microservices that can be quickly and easily deployed to powerful, cloud-based, IaaS platforms.

Many of the underlying software components used in these platforms are available via open-source software projects. However, using these open-source projects requires an investment of time and effort. Rather than diverting your resources away from the work that really matters (developing your software), you can take advantage of Eldarion’s expertise and the effort we’ve already put into deploying a system built upon these powerful components.

Specifically, Eldarion Cloud is built on KubernetesCoreOS and Docker — arguably the most popular and most widely-adopted, open source, containerization infrastructure software platforms in use today.

To this, we’ve added Kel (and Kel plug-ins), a layer of Eldarion-developed open-source tools and components that make it easy for us to maintain and optimize the system for our customer’s applications.

Eldarion DevOps Expertise

Eldarion Cloud builds on our experience hosting web applications for our clients for over 7 years, and our success with Gondor, a leading PaaS for hosting Python/Django applications.

We work closely with each of our customers to determine their needs in areas such as:

  • Monitoring and alerts
  • High availability and system failover
  • Disaster recovery
  • Security audits

We can then customize an Eldarion Cloud offering that fits with your business objectives and budget.

Eldarion Application-level Expertise

Our team has deep familiarity with the development of web applications and when needed, we can bring in additional Django/Python, Node, React and Angular. software development experts in to augment and/or support your software development team.

This means in many instances that we are able to:

  • Tailor your DevOps infrastructure to meet the specific needs of your applications
  • Use our software development expertise help find and resolve performance pain points
  • Keep your applications and its dependent components up to date with the latest security releases

In short, Eldarion Cloud is built with our deep knowledge of modern web application development and the infrastructure needed to run it. By combining this with Google’s data center and infrastructure expertise we can ensure Eldarion Cloud is the perfect solution to support the continuous development and release of your applications on a stable, scalable platform.

~~~~~~~~~~~~~~

You can follow me on Twitter via @jacobwegner, or request a meeting with me to discuss Eldarion Cloud in more detail via info@eldarion.cloud

Introducing Eldarion Cloud

If you follow Eldarion at all, or have previously seen anything about our recently announced open-source, DevOps PaaS called Kel™, you know Eldarion Cloud was never a secret. Well, now it’s official. And available.

Introducing Eldarion® Cloud, our commercial implementation of Kel plus expert, white-glove services that enable you to focus your development resources on your applications, not your application infrastructure.

Eldarion Cloud is powered by Kubernetes, CoreOS and Docker running on the Google Cloud Platform (GCP) — in my opinion, the most important infrastructure technologies to emerge in the last few years.

While Eldarion Cloud is a new offering, it is based on 7-plus years of commercial DevOps PaaS experience with a product we called Gondor. In fact, in many respects, Eldarion Cloud is the next generation of Gondor.

And it’s already been battle-tested in a number of live deployments by Gondor users who’ve already migrated to Eldarion Cloud.

Why Kubernetes (CoreOS, Docker and GCP)?

As you may already know, Kubernetes is the wildly-popular, open-source container orchestration system developed by Google. It’s based on a decade and a half of their experience running production workloads. And, we like it. A lot.

  • Its architecture is very flexible
  • Its APIs are easy to work with
  • And it’s massively scalable (Google calls it “planet scale”)

Kubernetes allows us to offer you future-proof scalability plus compatibility with many existing and emerging tools and technologies.

Kubernetes allows us to offer you future-proof scalability plus compatibility with many existing… Click To Tweet

CoreOS is the purpose-built, lightweight, container operating system for Docker applications. It was developed by a team of former Rackspace and Google DevOps pros. It provides us with a lean, purpose-built (for Docker), infrastructure-independent system that enables us to easily meet our customers’ business continuity and redundancy goals. Docker has emerged as the de facto way to containerize modern web applications. I recommend this ZD Net article for background.

For public cloud deployments Google Cloud Platform is the obvious first choice for web applications built on the Kubernetes stack, however if you require Azure or AWS we’re open to having that discussion.

We’re also happy to discuss private or hybrid clouds, and on-premises deployments.

Why is everything open source?

There’s no need to enumerate the well-known advantages of open-source software here. Suffice it to say we’re huge open-source believers here at Eldarion and in the DevOps world (which is largely comprised of developers) it’s more or less a table-stakes requirement these days.

We’re committed to using OSS components whenever possible, and to contributing our original works to the OSS community wherever and whenever it makes sense.

We’re committed to using OSS components whenever possible, and to contributing our original… Click To Tweet

Final Thoughts

We are very excited to be introducing a truly unique DevOps PaaS solution that’s specifically designed to address the unmet needs of startups, SMBs and product teams within larger organizations who would otherwise have to wait in line for corporate DevOps resources.

Contact us if you’d like schedule a meeting with us to learn more; follow us on Twitter; or, add yourself to the Eldarion Cloud mailing list.

You can also find me on Twitter via @jtauber.

Goodbye Gondor / Hello Kel and the Eldarion Cloud

(previously published on Eldarion.com)

As you may have heard, Eldarion is moving Gondor, its DevOps PaaS for Python / Django applications to Kubernetes—the open-source container orchestration platform originally developed by Google. Kubernetes allows us to greatly expand scalability, enabling us to offer support for more technology stacks as well as private cloud, public cloud and on-premise versions.

Eldarion will be open sourcing this work in a project named Kel™.

If you’re interested in getting involved in the Kel open-source project, please sign up on kelproject.com.

For existing Gondor customers this is great news. You’ll get the same high-quality PaaS services from Eldarion you’re accustomed to today with Gondor on a new, more scalable platform built with the fastest-growing container orchestration platform — Kubernetes. We will do most of the work to move you over if you wish, and most changes will be totally transparent to you. It’s largely under the covers.

In addition, anyone will be able to build Kel into their DevOps environment on their own via the corresponding open-source project.

What does this mean for existing Gondor customers?

Our current plan is to end support for Gondor on September 2nd of this year. In its place, we will be offering a new service called Eldarion® Cloud.

Eldarion Cloud will be a Kel-based offering supported by white-glove, high-touch application hosting and managed DevOps with an initial focus on Python / Django applications. This service will provide you with all the power, flexibility and scalability of the Kel open source platform, along with levels of service and support for your application environment that you won’t find with other PaaS offerings.

However, Eldarion will not be offering low-cost, entry-level options under $500 as we currently do with Gondor. Organizations who wish to spend less than $500 are encouraged to consider using Kel in its open source form, or seek out other offerings to support their needs. Everyone else is encouraged to consider migrating to Eldarion Cloud. We think it’s the perfect fit for current Gondor users, and one we’re confident you won’t find anywhere else.

For more information on Eldarion Cloud, please see eldarion.cloud or contact us at info@eldarion.cloud.

Whether you’re migrating from Gondor to Eldarion Cloud, need consulting services on performance optimizations, or custom application development, Eldarion stands ready to help you with all aspects of running your web applications.

We would like to thank our existing Gondor customers for their dedication and support to the Gondor platform over the past five years. We encourage you to reach out with any questions you may have about the sunsetting of Gondor, or the new and exciting offerings we’re developing around the Kubernetes platform. We look forward to hearing from you.

Rebuilding Gondor on Kubernetes

(previously published on Gondor.io)

Eldarion has been running Gondor, our PaaS offering, successfully for the last five years. Over the last several months we have been looking to expand what Gondor can run (i.e. more than just Python) and where it can run (i.e. alternative infrastructure providers, private clouds, etc). We’ve also wanted to give our users more control over scaling their infrastructure. Our existing platform makes accomplishing these things difficult. With this in mind, we set out to rebuild our platform using newer technologies.

We started by identifying the best technology to use. During our research, we found that the tools provided by CoreOS were exactly what we were after. CoreOS designed each tool to fill a gap in building scalable Linux servers. etcd deals with consensus and discovery. fleet gives you a distributed init system. flannel provides a network overlay giving containers running on several nodes a flat network. We built out some initial prototypes with these tools and found we were on the right path.

During CoreOS Fest in May, there was a lot of buzz around the Kubernetes project. The team decided to learn more about what it could provide and started prototyping Gondor components on Kubernetes. This was an incredible success. The capabilities native to Kubernetes made prototyping fast and efficient. Everything fit perfectly. Kubernetes solves every issue with our existing platform. It is lean, portable, extensible and self-healing.

Kubernetes is set up as a cluster consisting of a master and many nodes. The master is further broken up into individual parts that manage what happens on each node. The API server is responsible for dealing with first-class objects of Kubernetes (pods, replication controllers, services, namespaces, etc). Where Kubernetes really shines is in its network layout. Each node runs a proxy capable of forwarding packets and load balancing requests to any container running in the cluster.

Today, we have completely rebuilt all Gondor components to run on Kubernetes. We have started some internal testing of our own sites. We are planning to roll out the new platform to customers in the coming months.

We are very excited for the 1.0 launch of Kubernetes on Tuesday, July 21. We will be upgrading our clusters this week. The Kubernetes community has been extremely helpful and inviting. We have open-sourced two projects (k8s-http-router and piper) from the Gondor rewrite and will be contributing back even more! Stay tuned to our blog as we will be posting more technical content about our new platform.

A new helmsman has arrived on the Bay of Belfalas!