Browsed by
Tag: why

Where Do We Go Wrong With Agile

Where Do We Go Wrong With Agile

A few days ago I published this blog mocking the ‘Ceremony Focused Agile’ teams. But it is pointless to state what one thinks is not right, without also commenting on what one thinks is right! So let us do that today. Here is a conversation, which I am sure many of us have witnessed (or been part of):

A: “I have assigned a ticket to you, what is the status of it?”

B: “”Ticket, what ticket?”

A: “It’s in JIRA, check your board.”

B: “Okay, but what is it about?What is the context?”

A: “It’s in JIRA”

B: “huh?”

In Agile teams, people believe in using JIRA to track their work, hours spent, communication / discussion about a feature, proposed features, discarded features, its this amazing one-stop shop for all Agile stuff.. What a tool! (pun not intended!) We are told that using JIRA helps us track time, keeps everything organised, ensures no one can go back on their words or commitments. Which is true, using a single, capable tool for tracking everything related to a task would ensure that everything that happened in the context of a task. But let us not mistake using JIRA as practising Agile.

Agile Manifesto

Agile manifesto states in four, very clear and concise statements:

Individuals and interactions over processes and tools
Working software over comprehensive documentation
Customer collaboration over contract negotiation
Responding to change over following a plan

Yet, the reality is that these statements are understood by everyone differently. Some folks took away a different meaning from the word Agile itself, some did from the manifesto. Over time certain practices appeared, which seemed to work for certain use cases effectively. And many of us seem to have assumed their success is absolute, and expect them to work outside of the teams they were successful in. Somewhere we forgot, every team is different, so is agile for every team. It is certainly not a set of ‘mandatory rules’ that apply to every team the same way.

Then What Is Agile

AgilE to me is AtoE, or ‘Adapt to Evolve’. All the statements in the manifesto promise that when you change focus from tools/processes to people, from documentation to software, from contract to customer and form fixed plan to flexible plan, you will be able to evolve to match market needs and deliver a successful product. But then, how do we do it? Does it mean that we should stop using all tools/processes? Or signing contracts? Certainly not, these things are still important, but there is only so much that they can do. The manifesto tries to bring priority to them, stating that a doing is more important than planning, recording and tracking. That does not mean these are unimportant, but only that they are of less importance that the action itself.

When we put focus on people, and reduce the importance of processes and tools in the day to day life of these people, they interact more and better. This interaction results in better understanding and integration of the team itself. This results in better delivery. There is no tool that can match a person to person interaction. I understand, the interactions at times may not be face to face, but people have always communicated better when they talk than when they write. ‘Intent’ is always difficult to convey in writing, writing lacks tones. Also, these tools are asynchronous, adding further delays.

In the words Working Software, ‘working’ I believe is the important term. Working does not just mean ‘able to execute’, working means software fulfilling its purpose. A software is built to address a problem statement. As time passes the problem statement evolves, may be to adapt to the market, or to the changing landscape, or to fill a niche, but always with the intent of making solution more relevant. And hence the definition of ‘working’ keeps changing. And so the software should too. A Software is a way of specifying the requirements. This specification is a set of instructions, given to a machine to perform an algorithm, in a way that they can be clearly understood by others contributing to the specification. Also, if the software is supposed to replace the comprehensiveness of the documentation, without impacting the intended purpose of the documentation itself, then the same job must be done by the software equally well! In other words, the specification i.e. code, should read like a documentation! Now these, I believe constitute the ‘working’ of the software.

Engaging with the customer helps teams better understand their perspective which reflects in software. Contracts are required to ensure the business aspect of the software development, but they should not be a hindrance to value creation. The first and foremost job of an Agile team is to bring value to the customer, also referred to as Stakeholder. In the rawest of terms, it is more important to see if and how a task being done will bring value to the customer than looking for a way to create a new Change Request. 😉 Customers in software services industry are mostly external to the company or team and so collaboration becomes all the more important. In poetic terms, it is important to think of the software as ‘for us’ and not ‘for client’, and involving the customer aids this thinking.

It is a bit cliche to mention this, but ‘Change is the only constant’. There is no defined long term plan that can be executed exactly as expected, especially when the definition of ‘working’ itself is so prone to change. We can define a direction, and end goal, but we cannot define strict plan and even if we could, not sure it would help the changing context, except when ‘change is the plan’! Being able to incorporate change is the primary goal of an Agile team. Interestingly, when comparing different Agile frameworks you can see that one of the most important differentiating factor is the ‘frequency at which to expect change’. An Agile team should be ready and open to change than being strict about adhering to a plan. There are still plans, but in much smaller scale.

The Unsaid Requirements Of Agile

With all the points above, agile is said to give control to the team. An overused, always misunderstood term, “giving control”. And what does it mean? The ability to start or stop something, the ability to take decisions and executing them, that is control. Now I wonder, is it possible to give control, without having ‘trust‘? I do not see it possible, ever to do so. Would you ever give control of your car (project execution) to someone and ride along, without first trusting them to (intentionally or unintentionally) not kill you? Not possible. Traditionally, the management has always enjoyed this control. They have always believed that them making the decisions and team obeying and executing them is the right way. Agile requires them to relinquish this control, and it is certainly not going to be easy unless they trust the team to do the right thing.

So, the ‘management’ of an Agile team needs to first trust the team.

Now, would you trust a driver with no driving skill? Nope! The team needs to have the skill to make right choices, execute and deliver. Without this skill it is not possible for the management to trust the team. and without trust it is not possible for a team to be Agile.

An Agile team needs to have the required skill-set to make the project a success.

Would you trust a skilled, but irresponsible driver with your car and life? Would you trust a skilled, responsible yet unwilling driver, to take you to your destination? Certainly not, you and your car might end up in a ditch with the driver unscathed will move on to drive a different car! (analogy!) It is irrelevant if the driver is skilled, unless the driver is willing to take you to your destination, her skills are useless. The goals must align.

An Agile team must believe in the same goal as the management, must be willing to do what is needed to get there.

I always see the last requirement as a bit tricky. Why would a team of skilled, free-thinking individuals believe in someone else’s goal? This is where the ‘people interactions’ come in. It is not going to happen unless the team trusts the management to do right by them. Ah, it’s a game of trust, skill and will. There is no Agile without these. These assumptions should have been recorded somewhere, because this is the part that many teams and many traditional managers fail to understand. What this results into is a ceremony we like to call Agile.

The Ceremony

There are many ‘frameworks’ of Agile. Many different ways that can help you implement the manifesto better and they have different guidelines, like all thought-processes do. But these are guidelines, not rules. They cannot be forced on an unwilling team to beat them into being Agile. The core concept of Agile is that the team decides the practices they want to follow, in which form, to identify flows and improve on them. (Remember, we have already trust that the team is skilled and is willing to do right by this project.)

When we see meetings, call timings and statements forced upon team, they become mere ceremony, they lose their meaning, purpose and the result is a failure. A failure to achieve the goal, failure to build the team and a failure of the practice itself.

There are certain tools and practices though, that explained as being a part of Agile. These again are not rules, but arise from the need to respond to change rapidly. We need to deliver fast, and to do that. we need faster verification of software hence the need for Continuous Integration. To be able to deploy fast we need Continuous Delivery, so the artefacts are ready to go live, daring teams can even try Continuous Deployments. To deploy fast, we need to make the provisioning and configuration of systems automated and simplified, hence the need for DevOps. A car can go only as fast as the breaks can allow, so to deliver fast we also need to be able to revert fast, hence the need for artefact repositories and blue-green deployments. We need to change the specification quickly and hence the need to verify the specification at granular level, hence the need for Unit Tests. Since we are so focused on time, we should write specifications only for what is required, hence the TDD/BDD. Since the specification’s job is to convey intent to others, we need to have more than one person on the team who can understand the code, and to save time and effort, we have Pair Programming. Again, just following such does not make you Agile, similar to how not following some of them does not make you ‘not Agile’. Using Jenkins/GoCd, JIRA, Artifactory/Nexus etc tools does not make you Agile, and not using them, for a better alternative your team has established which allows you to act faster, does not make you ‘less Agile’.

And many other terminologies you would hear in Agile, know that are not part of the specification or requirement or some rule. Some of these things may help you be Agile, but the Agility is always in the context of what your team thinks is necessary to achieve the goal for your Customers.

The Markers

Enough theory, how do I tell if my team is Agile? Well, I have tried to build a list of markers that I have seen in non-agile Agile teams. Now this is certainly not an exhaustive list, and it is certainly not a rule, but indicators that can help identify the ceremony than agility.

  • You have a “manager”.
  • Your manager/scrum master, or someone ‘assigns’ you ‘tasks’, rather than you picking them.
  • This someone asks you for ‘status’ in your daily meetings. This is a bit tricky, remember you have control, and so you have responsibility to convey the status of the work you picked. It might be a failure on your side or management’s.
  • If your team has members reporting to different people, not in hierarchy.
  • If you learn about your tasks only via a tool, and also report via a tool.
  • If you prefer a tool or email over talking/quick calls when you need to discuss with your team.
  • If you have to jump through hoops and cc ten people to be able to talk to the customers.
  • If your productivity is measured solely in terms of ‘tickets’ fixed or moved.
  • If you as a team never meet to discuss what can be improved or you conclude that nothing can be!
  • If you do not know what others in your team are working on, blocked on and you are not helping to unblock them.
  • If you as a team are not driving to complete the goal of the iteration as a whole, and instead focus on finishing your work alone.
  • If you as an individual are not learning any new skill required by your work or performed in the team.
  • You have not changed the way you are working / following Agile practices in a long long time!

 

We need to talk, says one microservice to another

We need to talk, says one microservice to another

‘But how?’ asks the other service!
Ever wondered how we communicate? One would not believe how complex and multi-step process it is. It involves some very complex terms like perception, encoding, medium and decoding. Let us take a look at a diagram explaining this:

So what is the relation with microservices? Communication between two services is not much different. It follows through a process very similar to this, in fact it can be explained with the exact same steps! Consider two services a ShippingService written in Java requesting the preferred address of a user identified by Id, from a UserAddressService over REST a call:

Okay, so human communication can be used as an analogy to understand inter-service communication in a microservices stack. So what? So basically, it tells us that the scenarios of failure can also be same and we can understand microservice communication by relating to human communication.

Why

Before we deep dive, we should quickly consider why it matters at all in microservices. It is basically the difference between inter-process and intra-process communication. Consider this, intra-process, meaning communication between components of a single process, is like talking to one-self. You understand yourself, well (usually!), at least there is a whole less ‘miscommunication’ when talking to oneself. This is the kind that happens in monoliths, and why it was not so much of an issue until we started working on microservices, which have inter-process communication. To relate to this consider talking to people instead, known or unknown (API, authentication etc), people of different race, origin, nationality (perception), at a distance or near you (response time), on a phone vs in person (medium/network, protocol), direct vs indirect or at a different time (whoa! yeah, i.e. sync vs async, like leaving a note), speaking unknown language (encoding-decoding, JSON/XML/binary like gRPC), and of different background or upbringing (perception again) and in different mental states (well, stateless vs stateful services), telling them a secret or casual gossip (secure vs insecure). Now you know why working in teams is so difficult, and also microservices!

Types

Let us break down all the phases and see the types of potential problems. This is not some standard classification, but I have found this method effective when understanding the issues with analogies.

  1. Encoding: Failure to encode the object correctly by missing attributes or encoding with different name. For example, frameworks in java allow for choosing different attribute names in JSON/xml format, these are specified as strings and there is no real validation on these other than tests. Also how do we ensure if both services are using the same message structure? This can in part be ensured by sharing a common object model library. But then this goes against the principle of knowing models by views; not every services needs to know what all the attributes of a Customer object are! You might as well use queues for communication then.
  2. Sending the message: These include the whole set of issues which can occur due to not sending the message on correct protocol, address, port or path, the whole address related issues. Integration testing can not always help here, as some of these depend on the production infrastructure as well.
  3. Decoding: Being process is reverse of encoding and comes with all the same issues.
  4. Perception: Now these are some of the serious issues which you tend to miss even in unit and integration tests and can be caught only in later phases, when working against live services. If you encounter these, you can most certainly assume that apart from inter-service, you also have some team communication issues.
  5. Feedback: One of the most important step in the communication is the feedback, the acknowledgement of receipt of the message. There can be plenty of causes for a service to not respond, including all of the issues discussed above, and adding potential networking and issues related to health of the service.
  6. Cascading Failures: A much serious situation which is merely an outcome of failure to identify the issues in service communication and safeguarding against them.

Now that we are acquainted with the terms, I want to tell you a couple of stories; really scary, real stories.

Case study 1

I know of a Value Added Services (VAS) company, you know the services you never want yet your carriers charge you for, yeah, those. This company was one of the few trying to build a useful service you may want to pay for, trying to play fair and by the rules, even at a revenue loss. VAS industry suffers from considerable frequency of fraud and so it was said that the biggest crime is charging your users twice. Such incidents would be immediately flagged as fraud by the carriers and the service would be banned.

We had a call flow where a “Billing Service” makes a call to a “Carrier Integration Service” for charging which in tern makes a call to an external-internal Notification Service (a shared service hosted by a different unit) to send notification to user, after firing the charging request to the carrier. It worked fine for years, until one day this external-internal service suddenly slowed down, timing out all the calls from Carrier Integration Service, which in tern caused a time out on Billing Service, which treated it as a failure and ‘retried’ the billing call. The issue got flagged in minutes, but hundreds of users were charged multiple times that day before the team could respond.

There were a whole lot of things that went wrong here. There were feedback issues, which likely occurred due to network issue, giving rise to perception issues and finally causing a cascade. The worst thing though, was the cascade.

Case study 2

Another such story, not so grave, was when the Billing Service sent a request for partial charging, but the Carrier Integration Service denied honouring it. Both services used enums to identify the keyword used, both had integration tests and all worked fine. It was only after the services were tested against working instances of one another, in a pre-prod environment on a crunch day that it was identified. The issue was stupidly simple, both services used different value for the constant! Again a whole lot of things that went wrong here, most important was the perception issue caused due to the miscommunication between the teams working on these two services.

The Solutions

Discussing every step identified to fix the case studies we saw will be a story in itself, we shall discuss the first solutions applied to the most glaring causes in both cases. For the cascade, we made it a point that every service will implement Hystrix for every single inter-service communication call. Hystrix is a circuit breaker, meaning it wraps a chunk of logic (say a method) and on exception it can flag, throttle, block and bypass the method in question. The idea is to wrap the calls to external services, aka dependencies, and when something goes wrong give them time to recover by bypassing calls and safeguard the sender service from cascading the issue. We had hystrix is some services, but some teams had argued that it is an unnecessary complication for internal services and services that have been working fine. Well, that was before the incident. Everyone just jumped on it as the first change once we recovered from the impact. In my view, circuit breaker is a tool microservices should never be built without.  As an additional safety net, we also ensured that the Carrier Integration Service builds a temporary cache of all the users it processed and validate against it before it fired any call (Not every solution to any problem is purely technical!).

For the perception issue, we need to ensure that post encode-decode the receiver understands the same thing as the sender meant. We had hosted stubs against which we tested the services in automation testing phase. These stubs were dumb services implementing the same API as the service they stubbed. These were developed by whichever team needed them, essentially Billing Service team would never develop the stub for the Billing Service; which caused the discrepancy in the stub and actual service behaviour. This had to change, the team developing Billing Service was to be responsible for developing the stubs for Billing Service and team for Carrier Integration Service for it’s service stubs. This way, the stubs always perceived the same as the actual service did.

Now how do we address intra-team communication problems, anyone?

Software development hygiene: Why do we brush our teeth?

Software development hygiene: Why do we brush our teeth?

Yes, why do ‘you’ brush your teeth?
Is it guaranteed that if we brush our teeth twice a day, floss once a day, gargle with an antiseptic, we will never have toothache or bad breath? And if we did not brush teeth say, for a week, would we be guaranteed to have toothache? For a few months, may be yes, we might, might just have to get some treatment done for a few teeth. So the question, why do we brush our teeth, daily?

And how did we start brushing the teeth? Were we born with a brush in one hand, toothpaste in other and with an utter, inexplicable desire to brush teeth every morning after waking up from sleep and before going back to bed? Assuming that no one would remember how they themselves were born, all parents at least will agree with me, that this is certainly not the case. So the question, how did we start brushing our teeth daily?

And now the question you might have in your mind: “What’s the point?”
Recently, a person on our team raised this question(s): Why do we have unit tests. I have been writing good code, good enough that QAs do not find any critical issues, nor has anything ever severely broken in production because of my changes, why should I write tests? If I could think of all scenarios to unit tests, why do we have dedicated QAs on our team? Why should I pass my code through a static code quality analysis tool? All these processes are slowing us down. I have worked without all these processes in the past and that has worked quite well, why do I need this overhead of processes?

I agree, I hate processes.
Yet we need to appreciate the importance of processes and acknowledge where they are required. Come to think of it, why does a process exist? Can we not work without processes and the overheads thereof? Short answer: No, we cannot. Long answer: We can, given that everyone on the team understands the core reasoning for the existence of the process being bypassed and takes the responsibility of upholding the goal of this process without strict adherence to the process itself.

Well, how did I start brushing my teeth daily? My mother would tell me: till I was a couple years old, she used to brush my teeth. When I became three, she taught me how to do it and would ask me to show how clean my teeth were. She would ask me: “Are they shining when you look in the mirror?”, I would go and check and say “Yes”. When I became four, she would just remind me to brush, and I think at five I had finally started brushing my teeth daily, without having to show her how clean they were. I do not believe your story would be very different than this. It took years of practice and perseverance of our parents to eventually get us to brush our teeth daily so that finally we could get rid of the ‘overhead of process’.

Yes, many processes can be chucked as long as the goal is achieved; but are we, as a team, responsible enough to make sure they are achieved every single time? Let us say we are, but are we ready to carry the burden of remembering every single code smell, every single potential bug and be mindful of it while writing code? Is that even humanly possible? If the answer to that question is yes, sure go ahead and chuck the quality analysis tools, unit tests, pull requests and code reviews; we don’t need them. But if the answer is no, wait till it becomes yes!
We can certainly bypass processes and get an apparent speed-up, but chucking a process before we are ready is sure to give us pain in the tooth (and in a few more places)!