A Closer Look at Cloud FinOps
When I was first introduced to Agile development, it felt like a natural flow for developers and business stakeholders to collaborate and deliver functionality in short iterations. It was rewarding (and sometimes disappointing) to demo features every two weeks and get direct feedback from users. Continuous Integration tools matured to make the delivery process more automated and consistent. However, the operations team was left out of this process. Environment provisioning, maintenance, exception handling, performance monitoring, security – all these aspects were typically deprioritized in favor of keeping the feature release cadence. The DevOps/DevSecOps movement emerged as a cultural and technical answer to this dilemma, advocating for a much closer relationship between development and operations teams.
Today, companies are rapidly expanding their cloud infrastructure footprint. What I’ve heard from discussions with customers is that the business value driven by the cloud is simply too great to ignore. However, much like the relationship between development and ops teams during the early Agile days, a gap is forming between Finance and DevOps teams. Traditional infrastructure budgeting and planning doesn’t work when you’re moving from a CapEx to OpEx cost structure. Engineering teams can provision virtually unlimited cloud resources to build solutions, but cost accountability is largely ignored. Call it the pandemic cloud spend hangover.
Our customers see the flexibility of the cloud as an innovation driver rather than simply an expense. But they still need to understand the true value of their cloud spend – which products or systems are operating efficiently? Which ones are wasting resources?
I decided to look into FinOps practices to discover techniques for optimizing cloud spend. I researched the FinOps Foundation and read the book, Cloud FinOps. Much like the DevOps movement, FinOps seeks to bring cross-functional teams together before cloud spend gets out of hand. It encompasses both cultural and technical approaches.
Here are some questions that I had before and the answers that I discovered from my research:
Where do companies start with FinOps without getting overwhelmed by yet another oversight process?
Start by understanding where your costs are allocated. Understand how the cloud provider’s billing details are laid out and seek to apply the correct costs to a business unit or project team. Resource tagging is an essential first step to allocating costs. The FinOps team should work together to come up with standard tagging guidelines.
Don’t assume the primary goal is cost savings. Instead, approach FinOps as a way to optimize cloud usage to meet your business objectives. Encourage reps from engineering and finance to work together to define objectives and key results (OKRs). These objectives may be different for each team/project and should be considered when making cloud optimization recommendations. For example, if one team’s objective is time-to-market, then costs may spike as they strive to beat the competition.
What are some common tagging/allocation strategies?
Cloud vendors provide granular cost data down to the millisecond of usage. For example, AWS Lambda recently went from rounding to the nearest 100ms of duration to the nearest millisecond. However, it’s difficult to determine what teams/projects/initiatives are using which resources and for how long. For this reason, tagging and cost allocation are essential to FinOps.
According to the book, there are generally two approaches for cost allocation:
- Tagging – these are resource-level labels that provide the most granularity.
- Hierarchy-based – these are at the cloud account or subscription-level. For example, using separate AWS accounts for prod/dev/test environments or different business units.
Their recommendation is to start with hierarchy-based allocations to ensure the highest level of coverage. Tagging is often overlooked or forgotten by engineering teams, leading to unallocated resources. This doesn’t suggest skipping tags, but make sure you have a consistent strategy for tagging resources to set team expectations.
How do you adopt a FinOps approach without disrupting the development team and slowing down their progress?
The nature of usage-based cloud resources puts spending responsibility on the engineering team since inefficient use can affect the bottom line. This is yet another responsibility that “shifts left”, or earlier in the development process. In addition to shifting left on security/testing/deployment/etc., engineering is now expected to monitor their cloud usage. How can FinOps alleviate some of this pressure so developers can focus on innovation?
Again, collaboration is key. Demands to reduce cloud spend cannot be a one-way conversation. A key theme in the book is to centralize rate reduction and decentralize usage reduction (cost avoidance).
- Engineering teams understand their resource needs so they’re responsible for finding and reducing wasted/unused resources (i.e., decentralized).
- Rate reduction techniques like using reserved instances and committed use discounts are best handled by a centralized FinOps team. This team takes a comprehensive view of cloud spend across the organization and can identify common resources where reservations make sense.
Usage reduction opportunities, such as right sizing or shutting down unused resources, should be identified by the FinOps team and provided to the engineering teams. These suggestions become technical debt and are prioritized along with other work in the backlog. Quantifying the potential savings of a suggestion allows the team to determine if it’s worth spending the engineering hours on the change.
How do you account for cloud resources that are shared among many different teams?
Allocating cloud spend to specific teams or projects based on tagging ensures that costs are distributed fairly and accurately. But what about shared costs like support charges? The book provides three examples for splitting these costs:
- Proportional – Distribute proportionally based on each team’s actual cloud spend. The more you spend, the higher the allocation of support and other shared costs. This is the recommended approach for most organizations.
- Evenly – split evenly among teams.
- Fixed – Pre-determined fixed percentage for each team.
Overall, I thought the authors did a great job of introducing Cloud FinOps without overwhelming the reader with another rigid set of practices. They encourage the Crawl/Walk/Run approach to get teams started on understanding their cloud spend and where they can make incremental improvements. I had some initial concerns about FinOps bogging down the productivity and innovation coming from engineering teams. But the advice from practitioners is to provide data to inform engineering about upward trends and cost anomalies. Teams can then make decisions on where to reduce usage or apply for discounts.
The cloud providers are constantly changing, introducing new services and cost models. FinOps practices must also evolve. I recommend checking out the Cloud FinOps book and the related FinOps Foundation website for up-to-date practices.
The video below is Part 2 of our 3-part series: Building and Securing Serverless Apps using AWS Amplify. In case you missed Part 1 – take a look at it here. Be sure to stay tuned for Part 3!
According to Deutsche Bank CIO Frederic Veron, “enterprises that wish to reap the potentially rich rewards of getting IT and business line leaders to build software together in agile fashion must also embrace the DevOps model.”
Why is that? It’s simple: DevOps is necessary to scale Agile. DevOps practices are what enable an organization to rapidly deploy changes to many different parts of their product, across many products, on a frequent basis—with confidence.
That last part is key. Companies like Amazon, Google, and Netflix developed DevOps methods so that they could deploy frequently at a massive scale without worrying if they will break something. DevOps is, at its core, a risk management strategy. DevOps practices are what enable you to maintain a complex multi-product ecosystem and make sure that everything works. DevOps substitutes traditional risk management approaches with what the Agile 2 authors call real-time risk management.
You might think that all this is just for software product companies. But today, most organizations operate on a technology platform, and if you do, then DevOps applies. DevOps methods apply to any enterprise that creates and maintains products and services that are defined by digital artifacts.
DevOps methods apply to any enterprise that creates and maintains products and services that are defined by digital artifacts.
That includes manufacturers, online commercial services, government agencies that use custom software to provide services to constituents, and pretty much any large commercial, non-profit, and public sector enterprise today.
As JetBlue and Breeze airlines founder David Neeleman said, “we’re a high-tech company that just happens to fly airplanes,” and Capital One Bank’s CIO Rob Alexander said, “We’re a founder-led, 20-year-old technology company.”
Most large businesses today are fundamentally technology companies that direct their efforts toward the markets in which they have expertise, assets, and customer relationships.
DevOps Is Necessary at Scale
Scaling frameworks such as SAFe and DA provide potentially useful patterns for organizing the work of lots of teams. However, DevOps is arguably more important than any framework, because without DevOps methods, scaling is not even possible, and many organizations (Google, Amazon, Netflix…) use DevOps methods at scale without a scaling framework.
If teams cannot deploy their changes without stepping on each other’s work, they will often be waiting or going no faster than the slowest team, and lots of teams will have a very difficult time managing their dependencies—no framework will remedy that if the technical methods for multi-product dependency management and on-demand deployment at scale are not in place. If you are not using DevOps methods, you cannot scale your use of Agile methods.
How Does Agile 2 View DevOps?
DevOps as it is practiced today is technical. When you automate things so that you can make frequent improvements to your production systems without worrying about a mistake, you are using DevOps. But DevOps is not a specific method. It is a philosophy that emerged over time. In practice, it is a broad set of techniques and approaches that reflect that common philosophy.
With the objective of not worrying in mind, you can derive a whole range of techniques to leverage tools that are available today: cloud services, elastic resources, and approaches that include horizontal scaling, monitoring, high-coverage automated tests, and gradual releases.
While DevOps and Agile seem to overlap, especially philosophically, DevOps techniques are highly technical, while the Agile community has not focused on technical methods for a very long time. Thus, DevOps fills a gap, and Agile 2 promotes the idea that Agile and DevOps go best together.
DevOps evangelist Gene Kim has summarized DevOps by his “Three Ways.” One can paraphrase those as follows:
- Systems thinking: always consider the whole rather than just the part.
- Use feedback loops to learn and refine one’s artifacts and processes over time.
- Treat everything as an experiment that you learn from, and adjust accordingly.
The philosophical approaches are very powerful for the DevOps goal of delivering frequent changes with confidence, because (1) a systems view informs you on what might go wrong, (2) feedback loops in the form of tests and automated checks tell you if you hit the mark or are off, and (3) if you view every action as an experiment, then you are ready to adjust so that you then hit the mark. In other words, you have created a self-correcting system.
Agile 2 takes this further by focusing on the entire value creation flow, beginning with strategy and defining the kinds of leadership that are needed. Agile 2 promotes product design and product development as parallel and integrated activities, with feedback from real users and real-world outcomes wherever possible. This approach embeds Gene Kim’s three DevOps “ways” into the Agile 2 model, unifying Agile 2 and DevOps.
Download this White Paper here!
 Agile 2: The Next Iteration of Agile, by Cliff Berg et al, pp 205 ff.
We all are humans and tend to take the easy route when we come across certain scenarios in life. Remembering passwords is one of the most common things in life these days, and we often tend to create a password that can be easily remembered to avoid the trouble of resetting it in case we forget it. In this blog, I am going to discuss a tool called “Have I Been Pwned”(HIBP) which is going to help us find any passwords that were seen in recent cybersecurity or data breaches.
What is HIBP? What is it used for?
“Have I Been Pwned” is an open-source initiative that helps people to check if their login information has been included in any breached data archives circling the dark web. In addition, it also allows users to check how often a given password has been found in the dataset – testing the strength of a password against dictionary-style brute force attacks. Recently, the FBI released a statement that they are going to closely work with the HIBP team to share the breached passwords for users to check against it. This open-source initiative is going to help a lot of customers avoid using breached passwords when creating accounts on the web. We used the HIBP API to help our customers who use custom web-based applications get alerted of any pwned passwords that they used while creating accounts. In this way, the users will be aware of not using such breached passwords that have been seen multiple times on the dark web.
How does it work?
HIBP stores more than half a billion pwned passwords that have previously been exposed in data breaches. The entire data set is both downloadable and searchable online via the Pwned Passwords page. Each password is stored as an SHA-1 hash of a UTF-8 encoded password and the password count with a colon (:) and separated by each line with a CRLF.
If we must use an API to search online for the password that was breached multiple times, we cannot send the actual source password over the web as it will compromise the integrity of the user’s password that got entered during account creation.
To maintain anonymity and protect the value of the source password being searched for, Pwned Passwords implements a k-Anonymity model that allows a password to be searched for by partial hash using search by range. In this way, we just need to pass the first 5 characters of an SHA-1 password hash (not case-sensitive) to the API which will respond with the suffix of every hash beginning with the specified prefix, followed by a count of how many times it appears in the dataset. The API consumer now can search the results that match the source password hash by comparing them with the prefix and the suffix of the hash results. If the source hash was not found in the results, it means that the password was not breached until date.
Pass2Play is one of our custom web-based solutions where we integrated the password breach API to detect any breached passwords during the sign-up process. Below is the workflow:
- The user goes to sign up for the account.
- Enters username and password to sign up.
- After entering the password, the user gets a warning message if the password was ever breached and how many times was it seen.
In the above screen, the user entered the password as “P@ssword” and got a warning message which clearly says that the entered password has been seen 7491 times based on the dataset circling in the dark web. We do not want our users using such passwords for their accounts which could get compromised later using dictionary-style brute-force attacks.
Architecture and Process flow diagram:
API Request and Response example:
SHA-1 hash of P@ssword: 9E7C97801CB4CCE87B6C02F98291A6420E6400AD
Response: Returns 550 lines of hash suffixes that matches the first 5 chars
The highlighted text in the above image is the suffix that matches the first 5 hash chars’ prefix of the source password and has been seen 7491 times.
I would like to conclude this blog by saying that integration of such methods in your applications can help organizations avoid larger security issues since passwords are still the most common way of authenticating users. Alerting the end-users during account creation will make them aware of breached passwords which will also train the end users on using strong passwords.
As 2020 has unfolded, our development team has been working on a brand new app: Pass2Play! Check out the video below to see all of its features and capabilities!
To learn more about Pass2Play click here!
I have a deep interest in cybersecurity, and to keep up with the latest threats, policies and security practices, I became a member of ACT-IAC organization and enrolled in the Cybersecurity Community of Interest group. This is where I got the opportunity to work as a volunteer in the Zero Trust Architecture Phase 2 project. Hence, I am trying to share the knowledge I gained around ZTA strategy and principles. I am planning to break my blog into four series based on how the project progresses.
- What is ZTA?
- Real world deployment scenarios
- ZTA core capabilities
- Vendors providing ZTA capabilities
What is ZTA and how did it come into existence?
Traditionally, perimeter-based security has been used to protect the network infrastructure behind a firewall where if the user gets authenticated, they can access all the resources behind the firewall assuming all network users/devices as trustworthy. This caused a lot of security breaches across the globe where attackers could move laterally and exploit resources to which they were not authorized. The attackers only had to get through the firewall and later crawl across any resource available in the network causing potential damage in terms of data loss and other financial implications that can come via ransomware attacks.
Currently, an enterprise’s infrastructure operates around several networks like cloud-based services, remote users connecting from their own network using their enterprise-owned or personal devices (laptops, mobile devices), network location can change based on where the users/devices are connected from for e.g. public WIFI, internal enterprise networks etc. All these complex use cases made the possibility of moving away from perimeter-based security to “perimeter less” security (not confined to one network infrastructure) which led to the evolution of a new concept called as “Zero-Trust” where you “trust no one, but verify”. ZT approach is primarily based on data protection but it can be applied across other enterprise assets like users, devices, applications and infrastructure.
ZTA is basically an enterprise cybersecurity strategy that prevents data breaches and limits lateral movement within the network infrastructure. It assumes all the internal or external agents (user, device, application, infrastructure) that wants to access an enterprise resource (internal network or externally in the cloud) is not trustworthy and needs to be verified for each request before granting access to them.
What does Zero Trust mean in a ZTA?
In the above diagram, the user who is trying to access the resource must go through the PDP/PEP. PDP/PEP decides whether to grant access to this request based on enterprise policies (data/access/risk), user identity, device profile, location of the user, time of request and any other attributes needed to gain enough confidence. Once granted, the user is on an “Implicit Trust Zone” where it can access all the resources based on network infrastructure design. “Implicit Trust Zone” is basically the boarding area in an airport where all the passengers are considered trustworthy once they verify themselves through immigration/security check.
You can still limit access to certain resources in the network using a concept called “Micro-Segmentation”. For example, after getting through the security check and reaching the boarding area, passengers are again checked at the boarding gate to make sure they are entering the authorized flight to reach their destination. This is what “Micro Segmentation” means where the resources are more isolated to a segment and access requests are verified separately in addition to PDP/PEP.
Tenets of ZTA: (As per NIST SP 800-27 publication)
All the resources whether its data related, or services provided should be communicating in a secure fashion irrespective of their network location. Each individual access request will be verified before granting access to any resource based on the client’s identity, device they are using to request, type of application used, location coordinates and other behavioral attributes. Each access request granted will be authenticated and authorized dynamically and strictly enforced. In addition, the enterprise should collect all activity information, log decisions, audit logs and monitor the network infrastructure to improve the overall security posture.
What are the logical components of ZTA?
Policy Engine: Responsible to make and log decisions based on enterprise policy and inputs from external resources (CDM, threat intelligence etc.) to grant access or not to a request.
Policy Administrator: Responsible for establishing or killing the communication path between the subject and enterprise resource based on the decision made by PE. It can generate authentication tokens for the client to access the resource. PA communicates with PEP via the control plane.
Policy Enforcement Point: Responsible for enabling, monitoring and terminating communication between subject and enterprise resource. It can be either used as a single logical component or can be broken into two components: the client agent and resource gateway component that controls access. Beyond the PEP is the “Implicit Trust Zone” to access enterprise resources.
Control Plane/Data Plane: The control plane is made up of components that receive and process requests from the data plane components that wish to access network resources. The control and data planes are more like zones in the ZTA. All the resources, devices, and users within the network can have their own control plane component within them to decide whether the data should be routed further or not. In this diagram, it is just used to explain how control plane works for data plane components. Data plane simply passes packets around and the control plane routes them appropriately based on decisions made.
Note: The dotted line that you see in the image above is the hidden network that is used for communication between the various logical components.
Why should organizations adopt ZTA?
When adopting a ZTA, organizations must weigh all the potential benefits, risks, costs, and ROI. Core ZT outcomes should be focused on creating secure networks, securing data that travels within the network or at rest, reducing impacts during breaches, improving compliance and visibility, reducing cybersecurity costs and improving the overall security posture of an organization.
Lost or stolen data, ransomware attacks, and network and application layer breaches cost organizations huge financial losses and market reputation. It takes a lot of time and money for an organization to resume back to normal if the security breach was of the highest degree. ZT adoption can help organizations avoid such breaches which is the key to survive in today’s world, where state funded hackers are always ahead of the game.
As with all technology changes, the biggest challenge to demonstrate higher ROI and lower cybersecurity costs is the time needed to deliver the desired results. Organizations should consider the following:
- Assess what components of ZTA pillars they currently have in their infrastructure. Integration of components with existing tools can reduce the overall investment needed to adopt ZTA.
- Consider including costs or impacts associated with risk levels and occurrences when doing ROI calculations.
- ZT adoption should simplify, and not complicate, the overall security strategy to reduce costs.
What are the threats to ZTA?
ZTA can reduce the overall risk exposure in an enterprise but there are some threats that can still occur in a ZTA environment.
- Wrongly or mistakenly configured PE and PA could cause disruptions to the users trying to access the resources. Sometimes, the access requests which would get unapproved previously could get through due to misconfiguration of PE and PA by the security administrator. Now, the attackers or subjects could access resources from which they were restricted before.
- Denial of service attacks on PA/PEP can disrupt enterprise operations. All access decisions are made by PA and enforced by PEP to make a successful connection of a device trying to access a resource. If the DoS attack happens on the PA, then no subject would be able to get access as the service would be unavailable due to a flood of requests.
- Attackers could compromise an active user account using social engineering techniques, phishing or any other way to impersonate the subject to access resources. Adaptive MFA may reduce the possibility of such attacks on network resources but still in traditional enterprises with or without ZTA adoption, an attacker might still be able to access resources to which the compromised user has access. Micro-segmentation may protect resources against these attacks by isolating or segmenting the resource using technologies like NGFW, SDP.
- Enterprise network traffic is inspected and analyzed by policy administrators via PEPs but there are other non-enterprise-owned assets that can’t be monitored passively. Since the traffic is encrypted and it’s difficult to perform deep packet inspection, a potential attack could happen on the network from non-enterprise owned devices. ML/AI tools and techniques can help analyze traffic to find anomalies and remediate it quickly.
- Vendors or ZT solution providers could cause interoperability issues if they don’t follow certain standards or protocols when interacting. If one provider has a security issue or disruption, it could potentially disrupt enterprise operations due to service unavailability or the time taken to switch to another provider which can be very costly. Such disruptions can affect core business functions of an enterprise when working in a ZTA environment.
[ACT-IAC] American Council for Technology and Industry Advisory Council (2019) Zero Trust Cybersecurity Current Trends. Available at https://www.actiac.org/zero-trust-cybersecurity-current-trends
Draft (2nd 1) NIST Special Publication 800-207. Available at https://nvlpubs.nist.gov/nistpubs/SpecialPublications/NIST.SP.800-207-draft2.pdf
NIST Zero Trust Architecture Release: https://www.nccoe.nist.gov/projects/building-blocks/zero-trust-architecture
Most Agile transformation efforts in the government begin with the Scrum process. However, many agencies feel that they have reached a plateau and are ready to move through to the next logical steps. Improving digital services delivery and getting working software into the users’ hands shouldn’t stop with just Scrum. As agencies progress in their Agile transformation, they begin to see the value of adding the Agile engineering practices, such as Test-Driven Development and Continuous Integration to improve code quality and the downstream delivery of fully functional and tested software. And what about the challenges of scaling Agile for very large projects? What might a strategic progression for Agile transformation look like? This will be the focus of our ninth Agile in Government workshop, Agile Engineering, SAFe and DevOps: A Roadmap to Adoption at the Potomac Forum, Willard InterContinental Hotel on Thursday June 14, 2018.
Full Agenda and Registration can be found here.
Is your business undergoing an Agile Transformation? Are you wondering how DevOps fits into that transformation and what a DevOps roadmap looks like?
Check out a webinar we offered recently, and send us any questions you might have!
Recently, I was part of a successful implementation of a project at a big financial institution. The project was the center of attention within the organization mainly because of its value addition to the line of business and their operations.
The project was essentially a migration project and the team partnered with the product vendor to implement it. At the very core of this project was a batch process that integrated with several other external systems. These multiple integration points with the external systems and the timely coordination with all the other implementation partners made this project even more challenging.
I joined the project as a Technical Consultant at a rather critical juncture where there were only a few batch cycles that we could run in the test regions before deploying it into production. Having worked on Agile/Scrum/XP projects in the past and with experience working on DevOps projects, I identified a few areas where we could improve to either enhance the existing development environment or to streamline the builds and releases. Like with most projects, as the release deadline approaches, the team’s focus almost always revolves around ‘implementing functionality’ while everything else gets pushed to the backburner. This project was no different in that sense.
When the time had finally come to deploy the application into production, it was quite challenging in itself because it was a four-day continuous effort with the team working multiple shifts to support the deployment. At the end of it, needless to say, the whole team breathed a huge sigh of relief when we deployed the application rather uneventfully, even a few hours earlier than what we had originally anticipated.
Once the application was deployed to production, ensuring the stability of the batch process became the team’s highest priority. It was during this time, I felt the resistance to any new change or enhancement. Even fixes to non-critical production issues were delayed because of the fear that they could potentially jeopardize the stability of the batch.
The team dreaded deployments.
I felt it was time for me to build my case to have the team reassess the development, build and deployment processes in a way that would improve the confidence level of any new change that is being introduced. During one of my meetings with my client manager, I discussed a few areas where we could improve in this regard. My client manager was quickly onboard with some of the ideas and he suggested I summarize my observations and recommendations. Here are a few at a high level:
It’s common for these suggestions to fall through the cracks while building application functionality. In my experience, I have noticed they don’t get as much attention because they are not considered ‘project work’. What project teams, especially the stakeholders, fail to realize is the value in implementing some of the above suggestions. Project teams should not consider this as additional work but rather treat it as part of the project and include the tasks in their estimations for a better, cleaner end product.
In Daniel H. Pink’s book, Drive: The Surprising Truth About What Motivates Us, he discusses the motivations of knowledge workers. He makes the case that knowledge workers are driven by intrinsic factors and not the extrinsic factors of punishment and money. As he states, “Carrots & Sticks are so last Century. Drive says for 21st century work, we need to upgrade to autonomy, mastery, and purpose.” A great video covering his work is viewable at https://youtu.be/u6XAPnuFjJc. For most research on extrinsic and intrinsic motivation start with the work of Edward Deci from the 1970s.
Here is an explanation of the three types of motivation:
This is the granting of control over their own work to those doing the work. Guidance is fine, but too much and it becomes the micro-management which can be detrimental to motivation. Valuable feedback, performance metrics, and boundaries can be all that is needed.
This is an innate desire to get better at doing some task. If it is too easy, workers may get bored. If it is too hard and little progress is made, workers often get frustrated and give up. So tasks must be challenging, yet doable. And fostering an environment of continuous learning will add to motivation.
This is tying the work to a cause larger than themselves. Workers, who believe in that cause, feel that there is importance to the outcome of the work beyond just their own accomplishment.
According to Wikipedia:
DevOps is a software development method that stresses communication, collaboration, integration, automation, and measurement of cooperation between software developers and other information-technology (IT) professionals.
That certainly sounds like knowledge work to me. But are the three motivations the same for software developers and operations staff? And what might they be in a DevOps team. Let’s take a look in the chart below:
The management challenge then is to create a supportive culture where DevOps can flourish and the knowledge workers will be highly motivated by having aligned motivations of Autonomy, Mastery, and Purpose.
According to James P. Womack and Daniel T. Jones, “Lean Thinking is a business methodology which aims to provide a new way to think about how to organize human activities to deliver more benefits to society and value to individuals while eliminating waste.” In my opinion, Continuous Delivery and DevOps are the application of Lean Thinking to a part of the software development lifecycle. In particularly, the processes that occur from the planning of software development to deployment of the software into production.
But how do you know if you need Continuous Delivery and DevOps? Well, here are some typical candidates for applying Lean Thinking in your organization via Continuous Delivery (CD) and DevOps.
Long cycle time or lead time
A couple of lean metrics are important measures of the amount of time it takes to deliver value to the end user. One of them starts from the moment the feature is identified (lead time). The clock starts on the other when development of a feature begins (cycle time). DevOps is more applicable to the cycle time. If your organization has long cycle times, then CD and DevOps are a great approach to reveal where those delays are and start to eliminate them.
If your development team is using Scrum, then long cycle times may already be very transparent. The development team should quickly be completing high value features regularly. But that doesn’t mean those features are getting deployed quickly.
Mistakes during deployment
When deployments do occur in your organization, are you experiencing technical issues? Ever have a roll back? These mistakes can be the result of many different causes. Generally, they come about because of the large number of manual processes during deployment. Problems occur because of communication issues between teams and a lack of familiarity with the steps necessary to deploy. Does each deployment seem new every time?
Another symptom is an organization where there is a “hero mentality” regarding deployment. What I mean by this is that there is an expectation that there will be lots of problems during deployment and some individual or small team will rescue the deployments by putting in lots of late hours, consuming multiple cans of soda, and eating pizza. This mentality is often even more entrenched when a particular individual becomes the “go to” person (the hero) for the deployments because only they know or can figure out how to do them. Sometimes the hero team or individual embraces this role, but often times they really don’t want the stress and constraints that it entails.
Large overhead in deployment process
Usually, as a direct result of the bad deployments mentioned above, organizations start to place much heavier governance on the deployment processes in an attempt to prevent mistakes. This can be manifested via complex change control processes. Usually, these are manual and include a change control board and heavy documentation (readiness reviews, traceability, etc.). As a result, deployments are slowed down even more. Sometimes so much overhead can cause a rush to get things done at the end by those that do the deployments which leads to more issues. Also, because more people are involved with the communication of what needs to be done, there is further chances of errors occurring.
User impact from deployment
Does your organization take systems offline during deployments? How long are those downtimes? And once systems are brought back online do they need to learn a large new feature set? Downtime can often have a negative impact on your mission to your users and drives a lot of organizations to deploy very infrequently. But the infrequency of deployment means that a lot of changes to the system are introduced during those deployments. Impact to users can be substantial and takes away from the value you are delivering to them.
Resistance from operations staff
Are your operations teams resistant to perform deployments? Would they rather see systems never change and just support the status quo? If so, then it is probably due to the complaints directed toward them because of the issues described above. Often they have little control to resolve those issues and feel blindsided by what they are getting from the development teams. I can assure you that it is a rare individual who enjoys the stress of a deployment gone wrong. Clearly DevOps can help with this.
Of course, there are other measurements and policies that can be used to assess if you need to make changes to a non-DevOps environment or even improvements in your DevOps environment. Do you have more ideas or want to know more about assessing your need for DevOps, Continuous Delivery, or Continuous Deployment? Leave a comment below or contact us.
As most of you who operate in the Federal space are probably aware at this point, many Federal agencies are now utilizing Agile methods such as Scrum to manage their software development efforts. The goal for most of them is to reduce risk and accelerate system delivery to their end users. By using Scrum with the development team they have achieved part of their goal. But major risks and speedbumps still exist after the software is developed. These are encountered during deployment by the operations groups and are normally outside the purview of the development team.
The de facto approach to this issue in the private sector is Continuous Delivery and DevOps. That same approach is now being successfully applied to the public sector. Just how well is the government doing in its attempts to adopt this private sector best practice? On November 18th Dr. David Patton, Federal Practice Director, and Ashok Komaragiri, Senior Technical Consultant, both with CC Pace, will be joined by Joshua Seckel and Jaya Kathuria from the Department of Homeland Security, Tina M. Donbeck from the U.S. Patent & Trademark Office and John D. Murphy, with the National Geospatial-Intelligence Agency, to take an in-depth look at the state of DevOps in the Federal government.
For additional information visit: http://www.potomacforum.org/content/agile-development-government-training-workshop-vi-devops-%E2%80%93-taking-agility-government-new
In my last post, I talked about the interesting Agile 2015 sessions on team building that I’d attended. This time we’ll take a look at some sessions on DevOps and Craftsmanship.
On the DevOps’ side, Seth Vargo’s The 10 Myths of DevOps, was by far the most interesting and useful presentation that I attended. Vargo’s contention is that the DevOps concept has been over-hyped (like so many other things) and people are soon going to be becoming disenchanted with the DevOps concept (the graphic below shows where Vargo believes DevOps stands on the Gartner Hype Cycle right now). I might quibble about whether we’ve passed the cusp of inflated expectations yet or not, but this seems just about right to me. It’s only recently that I’ve heard a lot of chatter about DevOps and seen more and more offerings and that’s probably a good indication that people are trying to take advantage of those inflated expectations. Vargo also says that many organizations either mistake the DevOps concept for just plain operations or use the title to try to hire SysAdmins under the more trendy title of DevOps. Vargo didn’t talk to it, but I’d also guess that a lot of individuals are claiming to be experienced in DevOps when they were SysAdmins who didn’t try to collaborate with other groups in their organizations.
The other really interesting myth in Vargo’s presentation was the idea that DevOps is just between engineers and operators. Although that’s certainly one place to start, Vargo’s contention is that DevOps should be “unilaterally applied across the organization.” This was characteristic of everything in Vargo’s presentation: just good common sense and collaboration.
Abigail Bangser was also focused on common sense and collaboration in Team Practices Applied to How We Deploy, Not Just What, but from a narrower perspective. Her pain point seems to have been that technical stories that weren’t well defined and were treated differently than business stories. Her prescription was to extend the Three Amigos practice to technical stories and generally treat techincal stories like any other story. This was all fine, but I found myself wondering why that kind of collaboration wasn’t happening anyway. It seems like doing one’s best to understand a story and deliver the best value regardless of whether the story is a business or a technical one. Alas, Bangser didn’t go into how they’d gotten to that state to start with.
On the craftsmanship side, Brian Randell’s Science of Technical Debt helped us come to a reasonably concise definition of technical debt and used Martin Fowler’s Technical Debt Quadrant distinguish between different types of technical debt: prudent vs. reckless, and deliberate vs. inadvertent. He also spent a fair amount of time demonstrating SonarQube and explaining how it had been integrated into the .NET ecosystem. SonarQube seemed fairly similar to NDepend, which I’ve used for some years now, with one really useful addition: both NDepend and SonarQube evaluate your codebase compared to various configurable design criteria, but SonarQube also provides an estimated time to fix all the issues that it found with your codebase. Although it feels a little gimmicky, I think it would be more useful than just having the number of instances of failed rules in explaining to Product Owners the costs that they are incurring.
I also attended two divergent presentations on improving our quality as developers. Carlos Sirias presented Growing a Craftsman through Innovation & Apprenticeship. Obviously, Sirias advocates for an apprenticeship model, a la blacksmiths and cobblers, to help improve developer quality. The way I remember the presentation, Sirias’ company, Pernix, essentially hired people specifically as apprentices and assigned them to their “lab” projects, which are done at low-cost for startups and small entrepreneurs. The apprenticeship aspect came from their senior people devoting 20% of their time to the lab projects. I’m now somewhat perplexed, though, because the Pernix website says that “Pernix apprentices learn from others; they don’t work on projects” and the online PDF of the presentation doesn’t have any text in it, so I can’t double check my notes. Perhaps the website is just saying that the apprentices don’t work as consultants on the full-price projects, and I do remember Sirias saying that he didn’t feel good about charging clients for the apprentices. On the other hand, I can’t imagine that the “lab” projects, which are free for NGOs and can be financed by micro-equity or actual money, don’t get cross-subsidised by the normal projects. I feel like just making sure that junior people are always pairing and get a fair chance to pair with people they can learn from, which isn’t always “senior” people, is a better apprenticeship model than the one that Sirias presented.
The final craftsmanship presentation I attended, Steve Ropa’s Agile Craftsmanship and Technical Excellence, How to Get There was both the most exciting and the most challenging presentation for me. Ropa recommends “micro-certifications,” which he likens to Boy Scout merit badges, to help people improve their technical abilities. It’s challenging to me for two reasons. First, I’m just not a great believe in credentialism because I don’t find they really tell me anything when I’m trying to evaluate a person’s skills. What Ropa said about using internally controlled micro-certifications to show actual competence in various skill areas make a lot of sense, though, since you know exactly what it takes to get one. That brings me to the second challenge: the combination of defining a decent set of micro-certifications, including what it takes to get each certification, and a fair way of administering such a system. For the most part, the first part of this concern just takes work. There are some obvious areas to start with: TDD, refactoring, continuous integration, C#/Java/Python skills, etc., that can be evaluated fairly objectively. After that, there are some softer areas that would be more difficult to figure out certifications for, though. How, for example, do you grade skills in keeping a code base continually releasable? It seems like an all-or-nothing kind of thing. And how does one objectively certify a person’s ability to take baby steps or pair program?
Administering such a program also presents me with a real challenge: even given a full set of objective criteria for each micro-certification, I worry that the certifications could become diluted through cronyism or that the people doing the evaluations wouldn’t be truly competent to do so. Perhaps this is just me being overly pessimistic, but any organization has some amount of favoritism and I suspect that the sort of organizations that would benefit most from micro-certifications are the ones where that kind of behavior has already done the most damage. On the other hand, I’ve never been a boy scout and these concerns may just reflect my lack of experience with such things. For all that, the concept of micro-certifications seems like one worth pursuing and I’ll be giving more thought on how to successfully implement such a system over the coming months.