Tuesday, June 25, 2024
HomeSoftware DevelopmentMeasuring Developer Productiveness by way of People

Measuring Developer Productiveness by way of People

Someplace, proper now, a know-how govt tells their administrators: “we
want a solution to measure the productiveness of our engineering groups.” A working
group assembles to discover potential options, and weeks later, proposes
implementing the metrics: lead time, deployment frequency, and variety of
pull requests created per engineer.

Quickly after, senior engineering leaders meet to evaluation their newly created
dashboards. Instantly, questions and doubts are raised. One chief says:
“Our lead time is 2 days which is ‘low performing’ in line with these
benchmarks – however is there truly an issue?”. One other chief says: “it’s
unsurprising to see that a few of our groups are deploying much less usually than
others. However I’m undecided if this spells a possibility for enchancment.”

If this story arc is acquainted to you, don’t fear – it is acquainted to
most, together with a few of the greatest tech firms on the planet. It’s not unusual
for measurement packages to fall quick when metrics like DORA fail to supply
the insights leaders had hoped for.

There may be, nevertheless, a greater strategy. An strategy that focuses on
capturing insights from builders themselves, slightly than solely counting on
primary measures of pace and output. We’ve helped many organizations make the
leap to this human-centered strategy. And we’ve seen firsthand the
dramatically improved understanding of developer productiveness that it

What we’re referring to right here is qualitative measurement. On this
article, we offer a primer on this strategy derived from our expertise
serving to many organizations on this journey. We start with a definition of
qualitative metrics and the best way to advocate for them. We observe with sensible
steerage on the best way to seize, observe, and make the most of this information.

At present, developer productiveness is a crucial concern for companies amid
the backdrop of fiscal tightening and transformational applied sciences similar to
AI. As well as, developer expertise and platform engineering are garnering
elevated consideration as enterprises look past Agile and DevOps
transformation. What all these considerations share is a reliance on measurement
to assist information choices and observe progress. And for this, qualitative
measurement is essential.

Observe: once we say “developer productiveness”, we imply the diploma to which
builders’ can do their work in a frictionless method – not the person
efficiency of builders. Some organizations discover “developer productiveness”
to be a problematic time period due to the way in which it may be misinterpreted by
builders. We suggest that organizations use the time period “developer
expertise,” which has extra optimistic connotations for builders.

What’s a qualitative metric?

We outline a qualitative metric as a measurement comprised of information
supplied by people. This can be a sensible definition – we haven’t discovered a
singular definition throughout the social sciences, and the choice
definitions we’ve seen have flaws that we talk about later on this

Determine 1: Qualitative metrics are measurements derived from people

The definition of the phrase “metric” is unambiguous. The time period
“qualitative,” nevertheless, has no authoritative definition as famous within the
2019 journal paper What’s Qualitative in
Qualitative Analysis

There are various definitions of qualitative analysis, but when we search for
a definition that addresses its distinctive characteristic of being
“qualitative,” the literature throughout the broad subject of social science is
meager. The primary cause behind this text lies within the paradox, which, to
put it bluntly, is that researchers act as in the event that they know what it’s, however
they can not formulate a coherent definition.

An alternate definition we’ve heard is that qualitative metrics measure
high quality, whereas quantitative metrics measure amount. We’ve discovered this
definition problematic for 2 causes: first, the time period “qualitative
metric” contains the time period metric, which suggests that the output is a
amount (i.e., a measurement). Second, high quality is usually measured
by means of ordinal scales which can be translated into numerical values and
scores – which once more, contradicts the definition.

One other argument now we have heard is that the output of sentiment evaluation
is quantitative as a result of the evaluation leads to numbers. Whereas we agree
that the info ensuing from sentiment evaluation is quantitative, primarily based on
our unique definition that is nonetheless a qualitative metric (i.e., a amount
produced qualitatively) except one had been to take the place that
“qualitative metric” is altogether an oxymoron.

Except for the issue of defining what a qualitative metric is, we’ve
additionally encountered problematic colloquialisms. One instance is the time period “mushy
metric”. We warning in opposition to this phrase as a result of it harmfully and
incorrectly implies that information collected from people is weaker than “onerous
metrics” collected from techniques. We additionally discourage the time period “subjective
metrics” as a result of it misconstrues the truth that information collected from people
will be both goal or subjective – as we talk about within the subsequent

Qualitative metrics: Measurements derived from people
Sort Definition Instance
Attitudinal metrics Subjective emotions, opinions, or attitudes towards a particular topic. How happy are you together with your IDE, on a scale of 1–10?
Behavioral metrics Goal info or occasions pertaining to a person’s work expertise. How lengthy does it take so that you can deploy a change to manufacturing?

Later on this article we offer steerage on the best way to accumulate and use
these measurements, however first we’ll present a real-world instance of this
strategy put to apply

Peloton is an American know-how firm
whose developer productiveness measurement technique facilities round
qualitative metrics. To gather qualitative metrics, their group
runs a semi-annual developer expertise survey led by their Tech
Enablement & Developer Expertise workforce, which is a part of their Product
Operations group.

Thansha Sadacharam, head of tech studying and insights, explains: “I
very strongly consider, and I feel plenty of our engineers additionally actually
respect this, that engineers aren’t robots, they’re people. And simply
taking a look at primary numbers would not drive the entire story. So for us, having
a very complete survey that helped us perceive that complete
developer expertise was actually essential.”

Every survey is distributed to
a random pattern of roughly half of their builders. With this strategy,
particular person builders solely have to take part in a single survey per 12 months,
minimizing the general time spent on filling out surveys whereas nonetheless
offering a statistically vital consultant set of information outcomes.
The Tech Enablement & Developer Expertise workforce can be chargeable for
analyzing and sharing the findings from their surveys with leaders throughout
the group.

For extra on Peloton’s developer expertise survey, hearken to this

with Thansha Sadacharam.

Advocating for qualitative metrics

Executives are sometimes skeptical in regards to the reliability or usefulness of
qualitative metrics. Even extremely scientific organizations like Google have
needed to overcome these biases. Engineering leaders are inclined towards
system metrics since they’re accustomed to working with telemetry information
for inspecting techniques. Nonetheless, we can not depend on this similar strategy for
measuring folks.

Keep away from pitting qualitative and quantitative metrics in opposition to one another.

We’ve seen some organizations get into an inside “battle of the
metrics” which isn’t use of time or vitality. Our recommendation for
champions is to keep away from pitting qualitative and quantitative metrics in opposition to
one another as an both/or. It’s higher to make the argument that they’re
complementary instruments – as we cowl on the finish of this text.

We’ve discovered that the underlying reason behind opposition to qualitative information
are misconceptions which we handle under. Later on this article, we
define the distinct advantages of self-reported information similar to its skill to
measure intangibles and floor crucial context.

False impression: Qualitative information is just subjective

Conventional office surveys usually concentrate on the subjective
opinions and emotions of their workers. Thus many engineering leaders
intuitively consider that surveys can solely accumulate subjective information from

As we describe within the following part, surveys may also seize
goal details about info or occasions. Google’s DevOps Analysis and
Evaluation (DORA)
program is a wonderful concrete

Some examples of goal survey questions:

  • How lengthy does it take to go from code dedicated to code efficiently
    operating in manufacturing?
  • How usually does your group deploy code to manufacturing or
    launch it to finish customers?

False impression: Qualitative information is unreliable

One problem of surveys is that individuals with all method of backgrounds
write survey questions with no particular coaching. Consequently, many
office surveys don’t meet the minimal requirements wanted to supply
dependable or legitimate measures. Effectively designed surveys, nevertheless, produce
correct and dependable information (we offer steerage on how to do that later in
the article).

Some organizations have considerations that individuals could lie in surveys. Which
can occur in conditions the place there’s worry round how the info shall be
used. In our expertise, when surveys are deployed as a instrument to assist
perceive and enhance bottlenecks affecting builders, there isn’t a
incentive for respondents to lie or sport the system.

Whereas it’s true that survey information isn’t all the time 100% correct, we regularly
remind leaders that system metrics are sometimes imperfect too. For instance,
many organizations try to measure CI construct instances utilizing information aggregated
from their pipelines, solely to search out that it requires vital effort to
clear the info (e.g. excluding background jobs, accounting for parallel
jobs) to supply an correct outcome

The 2 sorts of qualitative metrics

There are two key sorts of qualitative metrics:

  1. Attitudinal metrics seize subjective emotions, opinions, or
    attitudes towards a particular topic. An instance of an attitudinal measure would
    be the numeric worth captured in response to the query: “How happy are
    you together with your IDE, on a scale of 1-10?”.
  2. Behavioral metrics seize goal info or occasions pertaining to an
    people’ work experiences. An instance of a behavioral measure can be the
    amount captured in response to the query: “How lengthy does it take so that you can
    deploy a change to manufacturing?”

We’ve discovered that the majority tech practitioners overlook behavioral measures
when enthusiastic about qualitative metrics. This happens regardless of the
prevalence of qualitative behavioral measures in software program analysis, such
because the Google’s DORA program talked about earlier.

DORA publishes annual benchmarks for metrics similar to lead time for
modifications, deployment frequency, and alter fail price. Unbeknownst to many,
DORA’s benchmarks are captured utilizing qualitative strategies with the survey
gadgets proven under:

Lead time

For the first software or service you’re employed on,
what’s your lead time for modifications (that’s, how lengthy does it take to go
from code dedicated to code efficiently operating in manufacturing)?

Greater than six months

One to 6 months

One week to at least one month

Sooner or later to at least one week

Lower than at some point

Lower than one hour

Deploy frequency

For the first software or service you
work on, how usually does your group deploy code to manufacturing or
launch it to finish customers?

Fewer than as soon as per six months

Between as soon as monthly and as soon as each six months

Between as soon as per week and as soon as monthly

Between as soon as per day and as soon as per week

Between as soon as per hour and as soon as per day

On demand (a number of deploys per day)

Change fail proportion

For the first software or service you’re employed on, what
proportion of modifications to manufacturing or releases to customers lead to
degraded service (for instance, result in service impairment or service
outage) and subsequently require remediation (for instance, require a
hotfix, rollback, repair ahead, patch)?







Time to revive

For the first software or service you’re employed on, how lengthy
does it usually take to revive service when a service incident or a
defect that impacts customers happens (for instance, unplanned outage, service

Greater than six months

One to 6 months

One week to at least one month

Sooner or later to at least one week

Lower than at some point

Lower than one hour

We’ve discovered that the power to gather attitudinal and behavioral information
on the similar time is a robust good thing about qualitative measurement.

For instance, behavioral information may present you that your launch course of
is quick and environment friendly. However solely attitudinal information might inform you whether or not it
is clean and painless, which has essential implications for developer
burnout and retention.

To make use of a non-tech analogy: think about you feel sick and go to a
physician. The physician takes your blood stress, your temperature, your coronary heart
price, and so they say “Effectively, it appears such as you’re all good. There’s nothing
unsuitable with you.” You’ll be stunned! You’d say, “Wait, I’m telling
you that one thing feels unsuitable.”

The advantages of qualitative metrics

One argument for qualitative metrics is that they keep away from subjecting
builders to the sensation of “being measured” by administration. Whereas we’ve
discovered this to be true – particularly when in comparison with metrics derived from
builders’ Git or Jira information – it doesn’t handle the principle goal
advantages that qualitative approaches can present.

There are three predominant advantages of qualitative metrics with regards to
measuring developer productiveness:

Qualitative metrics mean you can measure issues which can be in any other case

System metrics like lead time and deployment quantity seize what’s
taking place in our pipelines or ticketing techniques. However there are lots of extra
points of builders’ work that should be understood to be able to enhance
productiveness: for instance, whether or not builders are in a position to keep within the stream
or work or simply navigate their codebases. Qualitative metrics allow you to
measure these intangibles which can be in any other case tough or unattainable to

An fascinating instance of that is technical debt. At Google, a research to
determine metrics for technical debt included an evaluation of 117 metrics
that had been proposed as potential indicators. To the frustration of
Google researchers, no single metric or mixture of metrics had been discovered
to be legitimate indicators (for extra on how Google measures technical debt,
hearken to this interview).

Whereas there could exist an undiscovered goal metric for technical
debt, one can suppose that this can be unattainable resulting from the truth that
evaluation of technical debt depends on the comparability between the present
state of a system or codebase versus its imagined supreme state. In different
phrases, human judgment is important.

Qualitative metrics present lacking visibility throughout groups and

Metrics from ticketing techniques and pipelines give us visibility into
a few of the work that builders do. However this information alone can not give us
the total story. Builders do plenty of work that’s not captured in tickets
or builds: for instance, designing key options, shaping the course of a
venture, or serving to a teammate get onboarded.

It’s unattainable to realize visibility into all these actions by means of
information from our techniques alone. And even when we might theoretically accumulate
all the info by means of techniques, there are extra challenges to capturing
metrics by means of instrumentation.

One instance is the issue of normalizing metrics throughout completely different
workforce workflows. For instance, in the event you’re attempting to measure how lengthy it takes
for duties to go from begin to completion, you may attempt to get this information
out of your ticketing instrument. However particular person groups usually have completely different
workflows that make it tough to supply an correct metric. In
distinction, merely asking builders how lengthy duties usually take will be
a lot less complicated.

One other frequent problem is cross-system visibility. For instance, a
small startup can measure TTR (time to revive) utilizing simply a problem
tracker similar to Jira. A big group, nevertheless, will probably have to
consolidate and cross-attribute information throughout planning techniques and deployment
pipelines to be able to acquire end-to-end system visibility. This is usually a
yearlong effort, whereas capturing this information from builders can present a
baseline shortly.

Qualitative metrics present context for quantitative information

As technologists, it’s simple to focus closely on quantitative measures.
They appear clear and clear, afterall. There’s a danger, nevertheless, that the
full story isn’t being informed with out richer information and that this will lead us
into specializing in the unsuitable factor.

One instance of that is code evaluation: a typical optimization is to attempt to
pace up the code evaluation. This appears logical as ready for a code evaluation
could cause wasted time or undesirable context switching. We might measure the
time it takes for evaluations to be accomplished and incentivize groups to enhance
it. However this strategy could encourage destructive habits: reviewers dashing
by means of evaluations or builders not discovering the precise specialists to carry out

Code evaluations exist for an essential objective: to make sure prime quality
software program is delivered. If we do a extra holistic evaluation – specializing in the
outcomes of the method slightly than simply pace – we discover that optimization
of code evaluation should guarantee good code high quality, mitigation of safety
dangers, constructing shared information throughout workforce members, in addition to guaranteeing
that our coworkers aren’t caught ready. Qualitative measures may help us
assess whether or not these outcomes are being met.

One other instance is developer onboarding processes. Software program improvement
is a workforce exercise. Thus if we solely measure particular person output metrics such
as the speed new builders are committing or time to first commit, we miss
essential outcomes e.g. whether or not we’re totally using the concepts the
builders are bringing, whether or not they really feel secure to ask questions and if
they’re collaborating with cross-functional friends.

Methods to seize qualitative metrics

Many tech practitioners don’t understand how tough it’s to put in writing good
survey questions and design good survey devices. The truth is, there are
complete fields of research associated to this, similar to psychometrics and
industrial psychology. You will need to carry or construct experience right here
when attainable.

Under are few good guidelines for writing surveys to keep away from the most typical
errors we see organizations make:

  • Survey gadgets should be rigorously worded and each query ought to solely ask
    one factor.
  • If you wish to examine outcomes between surveys, watch out about altering
    the wording of questions such that you just’re measuring one thing completely different.
  • If you happen to change any wording, you will need to do rigorous statistical checks.

In survey parlance, ”good surveys” means “legitimate and dependable” or
“demonstrating good psychometric properties.” Validity is the diploma to
which a survey merchandise truly measures the assemble you need to measure.
Reliability is the diploma to which a survey merchandise produces constant
outcomes out of your inhabitants and over time.

One mind-set about survey design that we’ve discovered useful to
tech practitioners: consider the survey response course of as an algorithm
that takes place within the human thoughts.

When a person is offered a survey query, a sequence of psychological
steps happen to be able to arrive at a response. The mannequin under is from
the seminal 2012 guide, The Psychology of Survey

Parts of the Response Course of
Part Particular Processes

Attend to questions and directions

Symbolize logical type of query

Establish query focus (data sought)

Hyperlink key phrases to related ideas


Generate retrieval technique and cues

Retrieve particular, generic reminiscences

Fill in lacking particulars


Assess completeness and relevance of reminiscences

Draw inferences primarily based on accessibility

Combine materials retrieved

Make estimate primarily based on partial retrieval


Map Judgement onto response class

Edit response

Decomposing the survey response course of and inspecting every step
may help us refine our inputs to supply extra correct survey outcomes.
Creating good survey gadgets requires rigorous design, testing, and
evaluation – identical to the method of designing software program!

However good survey design is only one facet of operating profitable surveys.
Extra challenges embody participation charges, information evaluation, and realizing
the best way to act on information. Under are a few of the finest practices we’ve

Section outcomes by workforce and persona

A standard mistake made by organizational leaders is to concentrate on companywide
outcomes as an alternative of information damaged down by workforce and persona (e.g., position, tenure,
seniority). As beforehand described, developer expertise is extremely contextual
and might differ radically throughout groups or roles. Focusing solely on mixture
outcomes can result in overlooking issues that have an effect on small however essential
populations throughout the firm, similar to cell builders.

Evaluate outcomes in opposition to benchmarks

Comparative evaluation may help contextualize information and assist drive motion. For
instance, developer sentiment towards code high quality generally skews destructive, making
it tough to determine true issues or gauge their magnitude. The extra
actionable information level is: “are our builders extra pissed off about code
high quality than different groups or organizations?” Groups with decrease sentiment scores
than their friends and organizations with decrease scores than their trade friends
can floor notable alternatives for enchancment.

Use transactional surveys the place applicable

Transactional surveys seize suggestions throughout particular touchpoints or
interactions within the developer workflow. For instance, platform groups can use
transactional surveys to immediate builders for suggestions whereas they’re within the midst of
creating a brand new service in an inside developer portal. Transactional surveys can
additionally increase information from periodic surveys by producing higher-frequency suggestions and
extra granular insights.

Keep away from survey fatigue

Many organizations wrestle to maintain excessive participation charges in surveys
over time. Lack of follow-up could cause builders to really feel that
repeatedly responding to surveys is just not worthwhile. It’s due to this fact
crucial that leaders and groups observe up and take significant motion after surveys.
Whereas a quarterly or
semi-annual survey cadence is perfect for many organizations, we’ve seen some
organizations achieve success with extra frequent surveys which can be built-in into
common workforce rituals similar to retrospectives.

Survey Template

Under are a easy set of survey questions for getting began. Load the questions
under into your most well-liked survey instrument, or get began shortly by making a duplicate of our ready-to-go
Google Varieties template.

The template is deliberately easy, however surveys usually turn into fairly sizable as your measurement
technique matures. For instance, Shopify’s developer survey is 20-minutes
lengthy and Google’s is over 30-minutes lengthy.

After you’ve got collected responses, rating the a number of selection questions
utilizing both imply or prime field scoring. Imply scores are calculated by
assigning every possibility a price between 1 and 5 and taking the typical.
Prime field scores are calculated by the odds of responses that
select one of many prime two most favorable choices.

Make sure to evaluation open textual content responses which may include nice
data. If you happen to’ve collected a lot of feedback, LLM instruments
similar to ChatGPT will be helpful for extracting core themes and
ideas. While you’ve completed analyzing outcomes, make sure you share
your findings with respondents so their time filling out the survey
feels worthwhile.

How simple or tough is it so that you can do work as a
developer or technical contributor at [INSERT ORGANIATION NAME]?

Very tough

Considerably tough

Neither simple nor tough

Considerably simple

Very simple

For the first software or service you’re employed on, what
is your lead time for modifications (that’s, how lengthy does it take to go
from code dedicated to code efficiently operating in

A couple of month

One week to at least one month

Sooner or later to at least one week

Lower than at some point

Lower than one hour

How usually do you are feeling extremely productive in your

By no means

Just a little of the time

Among the time

More often than not

All the time

Please price your settlement or disagreement with the next

My workforce follows improvement finest practices
I’ve sufficient time for deep work.
I’m happy with the quantity of automated check protection in
my venture.
It is easy for me to deploy to manufacturing.
I am happy with the standard of our CI/CD tooling.
My workforce’s codebase is straightforward for me to contribute to.
The quantity of technical debt on my workforce is acceptable primarily based on our objectives.
Specs are constantly revisited and reprioritized in line with consumer indicators.

Please share any extra suggestions on how your developer expertise may very well be improved

[open textarea]

Utilizing qualitative and quantitative metrics collectively

Qualitative metrics and quantitative metrics are complementary approaches
to measuring developer productiveness. Qualitative metrics, derived from
surveys, present a holistic view of productiveness that features each subjective
and goal measurements. Quantitative metrics, then again, present
distinct benefits as nicely:

  • Precision. People can inform you whether or not their CI/CD builds are usually
    quick or gradual (i.e., whether or not durations are nearer to a minute or an hour), however
    they can not report on construct instances all the way down to millisecond precision. Quantitative
    metrics are wanted when a excessive diploma of precision is required in our
  • Continuity. Sometimes, the frequency at which a company can survey
    their builders is at most a few times per quarter. To be able to accumulate extra
    frequent or steady metrics, organizations should collect information

In the end, it’s by means of the mix of qualitative and quantitative metrics – a mixed-methods strategy
that organizations can acquire most visibility into the productiveness and
expertise of builders. So how do you utilize qualitative and quantitative
metrics collectively?

We’ve seen organizations discover success after they begin with qualitative
metrics to determine baselines and decide the place to focus. Then, observe with
quantitative metrics to assist drill in deeper into particular areas.

Engineering leaders discover this strategy to be efficient as a result of qualitative
metrics present a holistic view and context, offering huge understanding of
potential alternatives. Quantitative metrics, then again, are
usually solely out there for a narrower set of the software program supply
course of.

Google equally advises its engineering leaders to go to survey information first
earlier than taking a look at logs information for that reason. Google engineering researcher
Ciera Jaspan explains: “We encourage leaders to go to the survey information first,
as a result of in the event you solely have a look at logs information it would not actually inform you whether or not
one thing is sweet or dangerous. For instance, now we have a metric that tracks the time
to make a change, however that quantity is ineffective by itself. You do not know, is
this factor? Is it a foul factor? Do now we have an issue?”.

A blended strategies strategy permits us to reap the benefits of the advantages of
each qualitative and quantitative metrics whereas getting a full perceive of
developer productiveness:

  1. Begin with qualitative information to determine your prime alternatives
  2. As soon as you realize what you need to enhance, use quantitative metrics to
    drill-in additional
  3. Monitor your progress utilizing each qualitative and quantitative metrics

It is just by combining as a lot information as attainable – each qualitative and
quantitative – that organizations can start to construct a full understanding of
developer productiveness.

In the long run, nevertheless, it’s essential to recollect: organizations spend so much
on extremely certified people that may observe and detect issues that log-based
metrics can’t. By tapping into the minds and voices of builders,
organizations can unlock insights beforehand seen as unattainable.



Please enter your comment!
Please enter your name here

Most Popular

Recent Comments