Monitoring and QA are the Same Thing? (Part 1)

Many people think monitoring and QA are the same thing. They suggested that ultimately, we should be able to run the same QA test automation in production, as a monitoring approach to verify whether everything is working. Some said,

“your monitoring is doing comprehensive semantics checking of your entire range of services and data, at which point it’s indistinguishable from automated QA.”

I kind of disagree with that. For two reasons:

Reason 1. There are three monitoring approaches: 1) invariants, 2) metrics & logs, 3) synthetic transactions. By suggesting that monitoring and QA test automation are the same thing, people are equaling monitoring to synthetic transaction. However, the synthetic transaction approach is not sufficient to the need of live site monitoring. Synthetic transactions has some limitations:

  1. Representativeness. The fact that the test automation can provision a virtual machine in production doesn’t guarantee all the users can do so too.
  2. Cost. Every time the test automation provisions a new virtual machine, there is a cost on the system across the layers. As we increase the coverage of the monitoring and reduce the MTTD (mean time to detect), such overhead will become significantly a lot. In other words, using QA test automation for monitoring is not very economic. Plus, the cost will exponentially increase and the marginal return of investment will drop as we use synthetic transactions to cover more granular scenarios. In the past, I have seen for multiple times that groups went down such a slippery slope:
    • In the beginning, the team’s live site monitoring only provision one Small size Windows Server 2008 virtual machine in West US every 10 minutes.
    • Later, there happened a live site bug which only affected virtual machines running Windows Server 2012 OS. So instead of only covering Windows Server 2008, the team changed to cover all the OS types, including a few mainstream distro of Linux. That increased the number of synthetic virtual machines from one per 10 minutes to a half dozen per 10 minutes.
    • There happened another live site bug which caused virtual machine provisioning failing in some regions but not the others. Unfortunately, the synthetic transactions didn’t catch it because it only creates virtual machines in West US. Then the team changed to cover all regions rather than just West US. That increase the number of synthetic virtual machines by another 10x.
    • The team was challenged to reduce the MTTD. They figured that 10 minutes interval was too long and they reduced it to every 5 minutes. That doubled the amount of synthetic virtual machines.
    • The team also ran into a live site issue that only affects A8 and A9 sizes. So on and so on. In a year or two, the number of synthetic virtual machines increased from 1 to 100s.
  3. Blind spots. By running synthetic transactions, you will verify the scenarios that you know about and catch issues in these scenarios. But there are other scenarios (often corner cases) that you don’t know about or you don’t think it would be different. Hence, synthetic transactions won’t cover those scenarios and those are the places where some nasty live site incidents happened.

I believe that in live site monitoring, the other two approaches (logs and invariants) are needed to compensate these limitations of synthetic transactions. The three approaches are all useful and we need to effectively choose one or a combination of two or three for different purposes.

Reason 2. There are some important differences between the synthetic transactions used in monitoring vs. the test cases in QA test automation. By suggesting that monitoring and QA test automation are the same thing, people are equaling synthetic transactions to QA test automation. That’s not the case. Leaving aside the fact that not all the test cases can run in production, not all the test cases need to run in production. Just for a couple examples:

  • In QA test automation, a test case will verify that provisioning a new virtual machine must fail when the size is not among the supported sizes list (e.g. the output of ListSizes API call). Once our code have passed this test case in the in-house test pass, we believe it will behave the same way in production.
  • In QA test automation, a test case will verify that a customer can create a new virtual in the new A10 size if and only if the customer has enrolled in the “LargeVirtualMachine” beta feature. Once our code have passed this test case in the in-house test pass, we believe when the code is running in production, it will also correctly honor the enrollment status of the beta feature.

In both examples, we don’t need to run these test cases in production as a part of the live site monitoring.

The below chart illustrates the two reasons above (the scale is not in proportion):

monitoring != QA test automation

To be continued … (Part 2, Part 3)

Make Sense of the Test Results

Test automation per se is pretty common today. Few software companies are still hiring an army of manual testers (either regular employees or contractors).

However, the test results coming out of the test automation are usually still pretty raw. A typical test report today tells you that there are X test cases failed and Y passed. For each failed test case, the test report shows a brief error message (e.g. “AssertionFailed Timed Out waiting for status to be [Started]”) with a hyperlink to the full detail test log. That’s pretty much it. Beyond that point, from what I can see, people in many organizations are spending lots of time to make sense of the test results.

They want to figure out things like:

  • Among the failed test cases, whether the causes are all the same, or there are different causes for each failure?
  • Which failures are new failures vs. the chronic failure or flaky tests?
  • For a new failure, I want to quickly narrow down to one or two suspicious recent checkins?
  • For a chronic failure, is the failure this time the same kind as in the past, or the same test case but due to different cause?
  • Is the failure already being tracked by a bug? Is someone already working on it?
  • Is the failure supposed to be fixed? If so, the tracking bug should be reactivated since the fix didn’t seem to work.
  • Is the failure unique to my branch, or happening across the board? If it fails in other branches at the same time, it’s unlikely caused by changes in my branch and more likely an environment issue.

Besides understanding the failures, the engineers also care about the quality of the test automation:

  • Is the test pass taking longer time to finish than before? If so, why? Is that because: a) we have more test cases, b) the system-under-testing is now slower, c) a dependency of the system-under-testing is now slower, d) we are reaching the capacity limit of the test environment so things are getting queued/throttled, etc.
  • How is the repeatability of the test automation? What is the most flaky ones and what’s the reason?

Without help from good tools, the above analysis are laborious in many places today. No reason why we can’t use machines to perform these analysis and just put all the answers right in front of us soon after a test is finished. That shouldn’t be too hard. Most ideally, the machine can just tell us whether the build is good to ship or not. It’s very much like visiting hospitals for a general check-up. I don’t want to just get a pile of papers full of numbers, charts, etc., because I don’t know how to interpret them: is 42 bad or good for HDL cholesterol level? If bad, how bad it is? I have premature contractions? What does that mean? At the end, I just want to be told that “you are doing fine, just need to lose some weight”.

——

p.s. This reminds me of a TED talk that I watched recently. The speaker, Kenneth Cukier, said:

Big data is going to steal our jobs. Big data and algorithms are going to challenge white collar, professional knowledge work in the 21st century in the same way that factory automation and the assembly line challenged blue collar labor in the 20th century. Think about a lab technician who is looking through a microscope at a cancer biopsy and determining whether it’s cancerous or not. The person went to university. The person buys property. He or she votes. He or she is a stakeholder in society. And that person’s job, as well as an entire fleet of professionals like that person, is going to find that their jobs are radically changed or actually completely eliminated.

Well, long before machines can tell cancer cells from good cells and kill the lab technician’s job, we should be able to make machines help us make sense of the test results.

My Favorite Coding Questions

I was going through the latest posts on Hacker News and Quora tonight and noticed a few ones about coding questions. I really don’t like some of those coding questions, which ask you to “design an algorithm to …”, although those are among many people’s favorites. For example:

Given a string s1 and a string s2, write a snippet to say whether s2 is a rotation of s1 using only one call to strstr routine? (e.g. given s1 = ABCD and s2 = CDAB, return true; given s1 = ABCD and s2 = ACBD, return false)

I know the answer. I heard this problem from my wife when she was preparing for her last job change. I thought hard about it for several minutes and had no clue. Then my wife told me the answer. I was wondering, what’s the point to ask such questions in job interviews to hire programmers. It doesn’t tell me much about what’s the person’s methodology of exploring possible solutions. It doesn’t tell me much about whether this person is a good problem solver — puzzle solver != problem solver. If this person happens to already know the answer (just like I do), this question becomes worthless.

When I do coding interviews for developers, no matter college candidates or industry candidates with several years experience, my favorite questions are those which use very straightforward algorithm. For example,

1. Search a number in a rotated sorted array.
2. Partially revert a linked list.

These kinds of questions help me find out whether the person can translate ideas into code quickly and correctly, which is the most common task in we developers’ daily work. Most of the time, we already know how to solve the problem on the paper, the remaining thing is to just turn it into the code so that computers can execute it (to solve the problem for real). It’s just like that in the restaurants, most cooks’ daily job is to translate recipes into dishes, quickly and correctly. Only occasionally they need to come up with new recipes.

The Everyday-Everyone Quadrants

This is a simple model that I came up with by myself and have been using to judge business ideas that I had or saw. I’m sure some other people use similar models.

In my model, the Everyday-Everyone Quadrants model, there are two dimensions:

  • X-axis: would people use it on daily basis, or only once a while?
  • Y-axis: would this be used by everybody, or only a specific group of people?

Quadrant I is the products that would be (at least in theory) used by nearly everybody out there, and on daily basis (or at least on regular basis like a few times every week or every month). The companies behind these products could reach 100s billion market cap. Everybody wishes they would have an idea in this quadrant and make it a business. On the other hand, for sure it’s very hard to build something that everybody would want to use everyday.

The Everyday-Everyone Quadrants Model

Neighboring to Quadrant I, Quadrant II and IV each takes one element out of the “everyone-everyday” formula: either the “everyone” part, or the “everyday” part.

Quadrant II is these “everyone-but-not-everyday” products, such as LinkedIn, Zillow, Expedia and Angies List. That’s quite obvious: most of us only change job every a couple years (LinkedIn), only buy/sell/rent home every a few years (Zillow), only travel a couple times a year (Expedia) and only need to look for contractors when something needs to be fixed (Angies List). It’s worth noting that some guys in Quadrant II want to move into Quadrant I, by getting users to use their product more often. For example, LinkedIn. They said you should visit http://www.linkedin.com more frequently, rather than only when you want to change job. LinkedIn puts more emphasis on “professional social network”. They told people you can be a passive job seeker by well maintaining your LinkedIn profile. They are trying to be a professional publishing platform, starting from Influencers and later opening up to all members.

At the contrast to Quadrant II, Quadrant IV is “everyday-but-not-for-everyone”. These are specialty products (StackOverflow and GitHub for developers), or targeting at a specific interest group (Twitch for gamer and Leafly for smoker). If done right, companies in Quadrant II and IV can reach 10s billion market cap, or at least billion dollar level. That’s for sure that their sizes can’t compare to those in Quadrant I: the products in Quadrant II and IV are either not getting used less frequently, or not getting used by fewer people.

Quadrant III is kind of the “Do Not Enter” zone, because if a product is only needed by a small group of people, and they only use the product once a while, why would that product become a sizable business? There are some successful products in Quadrant III, but in general, the potential in Quadrant III is much smaller than in the other three quadrants and the return would be smaller, too. From time to time, I did have some idea popping up and after examining it with this Everyday-Everyone Quadrants model, I found it fell into Quadrant III. Then I told myself to forget about it.

“Signoff” Is the Word to Prohibit

Recently I saw an email thread in which people in a team were asked to sign-off on a hotfix. The BVT run of that hotfix only passed 97% (11 failures out of 380 test cases in total). I told them that something in the team’s engineering practice is wrong when there is such a thread asking people to sign-off on a BVT result of a hotfix.

That is because:

  • The reason there is a request to signoff is because the BVT is not 100% passing. If it were 100% passing or just 1 failure, they wouldn’t need such a signoff thread.
  • BVT should be 100% passing for all the time or just 1-2 intermittent failure at worst. It’s a big problem that a BVT has 11 failures (97% passing).
  • It’s an even bigger problem that BVT is only 97% passing on a hotfix. The code (including test automation) in a hotfix branch should be in very good quality and very reliable.
  • If this 97% passing BVT result were due to test environment issue, they should have very quickly fixed the environment issue and get another BVT run (which should have got result in at most a couple hours). It would have been another problem if they weren’t able to quickly fix test environment issues that caused large amount of failures in BVT.

Actually, not just in this specific team, not just on BVT result , no modern established engineering team should be doing any kind of signoffs. Signoff means making decisions heavily based on human judgments, which shouldn’t be in the equation when it comes to determine whether a piece of code is good enough to ship or not. Signoff should be prohibited in day-to-day work in software engineering. The use of the word “signoff” usually indicates issues in the team’s engineering practice and culture.

Instead of “signoff”, the right word to use is “override”. It’s like in the airports. In many airports, passengers go through the screening machines and only if the screening machines beeps, there is an officer to do pat-down. After pat-down, the officer can let us go (overriding the outcome of a machine), or send us back if they do find something suspicious (confirming the outcome of a machine). Things should work the same way in shipping software: the code just goes out of the door when the automated validation says “Good”. We only stops the train when the test automation beeps, then it comes human inspection and judgment: we can still let the code ship (overriding), or pull the code back if we do find something suspicious.

Of course, test automation sometimes may be more complicated than a screening machine. For a sizable product, there is a whole lot of work to do to make sure test automation not to beep too often nor too rarely.

LeetCode Is Like Doping in Sports

How to prepare for coding interview is an open secret nowadays.

Search for LeetCode in GitHub. For a software engineer who is serious about changing job, he just needs to commit a couple hours a day for a month or two, going through the one hundred and seventy-ish coding problems in LeetCode for two to three times. Then he is good to go. As long as he has been a workable developer before, with the help of LeetCode and such, plus books like “Cracking the Coding Interview”, the chance is very high for him to pass the coding interviews in nearly all software companies (including Facebook, Linkedin, Amazon, Microsoft, etc., with Google being probably the only exception) — although some may still fail to nail an offer due to other reasons like design (e.g. “Tell me, how to design the Timeline in Facebook”) or communication.

LeetCode and such are like doping in sports (except for the difference in legality). They are very effective boosters that raise your performance in coding interviews. But the boosters only last so long. Before long, you will go back who you truly are. As the result, in the recent years, repeatedly I have seen developers writing production code with bugs that will never pass coding interviews. Here is a few real examples of such bugs (with necessary obfuscation and slight simplification for formatting and other obvious reasons), which had all previously caused (sometimes very expensive) live site incidents — it would be a separate topic about how come they slipped through code review and uncaught by unit tests.

Example 1: what if the length of the tokens array is 1?

internal static IPSubnet ParseSubnet(string subnetString)
{
   string[] tokens = subnetString.Split(IPHelpers.PrefixDelimiter);
   IPSubnet subnet = new IPSubnet();
   subnet.Prefix = byte.Parse(tokens[1]);
   //more code...

   return subnet;
}

Example 2: what if link doesn’t contain “.vhd”?

private string GetImageName (Uri link)
{
   string str = link.ToString();
   int startPos = str.LastIndexOf("/") + 1;
   int endPos = str.LastIndexOf(".vhd", StringComparison.OrdinalIgnoreCase);
   return str.Substring(startPos, endPos - startPos);
}

Example 3: what if bufferSize is 0?

public static int GetIndex(int bufferSize)
{
   int index = 0;
   while ((bufferSize & 1) == 0)
   {
      index++;
      bufferSize = bufferSize >> 1;
   }
   return index;
}

Example 4: what if fromTime is Feb.29, 2012?

// Setting certificate expiry.
endTime.wYear = fromTime.wYear + 1;
endTime.wMonth = fromTime.wMonth;
endTime.wDay = fromTime.wDay;

I am sure the engineers who wrote these bugs weren’t like this during the coding interviews. It’s just that in their day to day work, they are not asking themselves these “what if” questions as they would have during interviews. That’s because it’s not a part of them. They don’t have that habit. Their performance during coding interview was the result of preparation and the boosters. As the result, it’s very sad that in today’s tech industry, people produce their best quality code only during job interviews.

Of course, software engineers must be able to write solid code really quick, just like soccer players must run fast, basketball players must jump high and chefs must be able to slice the onions very thin. Good scouts make sure the players run this fast or jump this high year around, and can tell if the guy’s performance today is because of 5 cans of Redbulls or not. The challenge for the hiring in tech industry is to be able to exclude the influence of the boosters and find out what the candidate is really capable of in day to day work.

How Do We Measure An Engineering Practice

Note: in this article, I raised a question without answer. Rather, I wanted to point out that we have a problem to work on and there is a large space for new innovations and opportunities.

The problem is: how do we measure an engineering practice. How do we prove it works/doesn’t work; how do we prove it’s superior than other ways; etc.

For examples,

  • How do we measure hackathon? How do we prove it works; how do we prove that hackathon brings values which we won’t get in other ways; etc.
  • How do we measure the return of investment in unit test?
  • How do we measure DevOps? How do we prove that it’s superior to other models
  • … (the list can go very long)

Take hackathon as an example. There are a lot of arguments in favor of it. Most of them are subjective, about personal feeling and observation, and appeal to perception and using reasoning rather than numbers. For example, some said “lots of good ideas emerged from hackathons”. Then the question would be: would these ideas have been emerged anyway in other ways? How to prove that some of these ideas would never have emerged without hackathon? For another example, some said “hackathon boosts team morale”. Then the question would be: how do you measure the boost? Are you sending a survey to the participants? In companies, such surveys are not to be trusted. People are afraid of say negative things, especial to the things arranged/sponsored/initiated by top executives. If someone say “we found good hires from hackathon”, then the question would be: could you have found good hire equally effectively (if not more effectively) through other ways? Some arguments were logically flawed: “Facebook does hackathon, Facebook is successful, so hackathon must help make a software company successful”.

Talk about unit test. If someone uses “unit tests will help cut bug count by up to 90%” to support unit test, the number will get questioned really hard, since no two projects are the same. In many other fields, such as in sociology, we can exam tens of thousands of similar samples and use statistical regression to prove that “being exposed to bilingual environment from age 0-6 will help boost the annual income at age 30 by 2.5%”. That kind of study and numbers in sociology is sound and solid. But we can’t do that for software engineering. We don’t have thousands of similar software companies/projects to study and come up with some numbers like “companies that heavily invested in unit test get boost in their revenue by 5%”. Many people (including me) strongly believe in unit test and practice it in our daily work, not because we were convinced by some statements like “smoking increases the chance of cancer by x%”, instead, because we tried it and found it was helpful.

Look at DevOps. There were some data to prove that’s a good idea. But these data are not that solid to pass strict exam. “Job satisfaction increased by 10%” — according to what? Some internal survey? We know that people “engineered” such surveys to get the result the way they want to see. “Release cycle shortened by 50%” — could that also have been equally effectively achieved without DevOps? “Live site incidents reduced by 30%” — May be the team was already on steady trend toward higher efficiency. It’s a common logic fallacy that people mistakenly think when two things happen in sequence, the first one is the cause of the second one.

By questioning these, I’m not arguing that we shouldn’t be searching for and trying out new ways in software engineering to be more effective and produce better outcome. We should never stop searching and trying and getting better.

When we think we are becoming more data driven to adopt new practices, our approach of advancing engineering practice is still very pragmatic today: some people has got ideas and go ahead trying. Sometimes it works, sometimes it doesn’t. When it works, they tell people. Others have tried, see the benefit, and tell more people. More people follow suit and over the time, it became common and popular and eventually everybody adopts it as a standard way of doing things. When we want ourselves to become more data driven, we should realize that the data we use are often weak evidences, misuses of statistics or subject to interpretation.

A Manager Must Hide The Disagreement?

I came across a blog post about being a manager. The author believes that a manager can disagree up, but can’t disagree down and if the manager disagrees with the company’s decision, he must hide that disagreement (in front of his team). Speaking from my own experience (nearly 10 years as a manager), that’s wrong.

For the completeness, here is the original words in the blog:

It is your responsibility as a manager to support the company’s decisions. Not just to execute on them, but to support them, to communicate that support, and if you disagree then you must hide that disagreement in the service of the company. You can disagree up — though even that is fraught with danger — but you can’t disagree down. You must hold yourself apart from your team, putting a wall between you and your team. To your team you are the company, not a peer.

The impact of complaints filtering up is much different than the impact of complaints filtering down. In some sense as a manager you must manufacture your own consensus for decisions that you cannot affect. You are probably doing your reports a favor by positively communicating decisions, as they will be doing themselves a favor by positively engaging with those decisions. But their advice is clear: if you are asked your opinion, you must agree with the decision, maybe stoically, but you must agree, not just concede. You must speak for the company, not for yourself.

The right way should be “disagree and say yes”. There are two reasons why a manager should do so, in front of his manager as well as his team:

  1. You can’t fake it. We are all grown-ups and we are all smart people (with college degree and working in good companies after harsh interviews and emerging from all the job seekers). So we don’t give bullshit to each other because we can easily tell and when someone does, we tell him “don’t give me the bullshit, treat me like an adult”. We do social lying (“hey, your daughter is so cute”) but we all know what it is. We have to be ourselves or it will be very difficult and stressful to keep telling the lies. Manager is a particularly stressful job (not only need to take care of yourself, but also take care a number of people who report to you, directly/indirectly), so you don’t want to add more stress to it or you get burned out very quickly.
  2. You want to be treated the same way by your team, when you ask them to do something or do things in some ways that they disagree. At work, sometime it’s a dirty work that I got from my manager (who may have got it from his manager, so on and so on) and I just need someone in my team to get it done. Sometime it’s the right thing to do, but may be uncomfortable — just like getting my boy to brush his teeth. In all those occasions, I want my team to just say yes and do it, though they disagree. To build a team like that, the key is to show them that the manager is doing the same to his own manager, just like one day our children will treat us the same way how we treat our own parents today.

At the end of the day, if one finds he’s in disagreement with the company or his management chain so often, it’s a sign that he should find another company or another team.

Comment on Box’s Flaky

Flaky is a test tool that Box shared with the community about a year ago. In my opinion and own experience, the tool solves the problem it wanted to solve, but it’s a poisonous tool to have for any engineering organization who wants sustainable success. Here is the comment that I left on their blog post:

I was in the same spot in the past: a few years back, we’ve had tests that failed intermittently and the causes sometime were external to our component, but in other services owned by folks in the other floor/building. Naturally, we built a similar way to rerun the failed tests automatically.

Later, I found a problem in this approach: it discouraged people from doing deep investigation and hide some bugs in our own component. Discourage: as long as the rerun passed, no one would look at the failure in the first run. Hide bugs: there are product bugs that have intermittent nature (rather than causing some functionality 100% failing); also, there were test automation bugs that genuinely cause intermittent test failures (lack of test repeatability, or test automation reliability issue). It was also a slippery slope: over the time, the amount of rerun increased (since no one spent time looking at why the tests failed on first attempt), which caused the total duration of the test pass to increase.

Seeing those problems, I stopped the rerun. I told the team because our component has deterministic nature (rather than fuzzy logic products like face recognition, speech, relevancy, machine learning, …), our tests should be deterministic and highly repeatable. I forced the team to investigate every intermittent failure. It turned out that we found a lot of genuine issues in the product code as well as the test automation. We fixed them. I also tenaciously push all the teams (not only my own team, but also folks on other floors/buildings) to improve the design/architecture so that it’s easier to write more repeatable test automation. It paid off pretty well. After about 1 year since stopping the rerun, the amount of flaky tests significantly dropped (from more than 5% of the entire test automation to <0.5%). The total duration of test pass dropped. People are less frustrated by dealing with intermittent failures all the time.

In short, having a tool to automatically rerun failed tests is poisonous. It makes life easier now, but sends your engineering toward the wrong direction.

我的育儿观

每个人有每个人的育儿观。有些父母希望自己的孩子是一个有爱心的人,有些父母培养自己的孩子吃苦耐劳。而对于我来说,我最希望郑轶嘉能够学会自己决定what to do,并且stand behind his own choices

“学会自己决定what to do”意味着:

  1. 知道自己要什么。

    在我和郑轶嘉的日常对话中,我一直避免使用“听话”这个词。尽管follow instructions是重要的,且是一种重要的社交能力,但中文里的“听话”和英文里的“follow instructions”有一些重要而微妙的区别。“听话”带有一点点不动脑子的味道。”听话”也或多或少和培养独立思考能力背道而驰,进而会导致人云亦云。“听话”的一个典型的例子是:不少人谈起自己的过去的时候,父母扮演了决定性的因素— 考大学的时候是我爸妈要我选这个专业的,找工作的是我爸妈要我考公务员的,找女朋友的时候是我爸妈更喜欢这个而不是那个,…

    知道自己要什么还意味着知道what makes me happy。很多人不快乐的源头之一是并不知道什么使自己快乐— 这是显然的:选饭店和点菜的时候如果不知道自己喜欢吃什么,那就有更大的可能会吃到不好吃的。

  2. 不要犹犹豫豫。

    俗话说,鱼与熊掌不可兼得,you can’t have a cake and eat it。每个人每天都只有24小时。24个小时里面能做的事情是有限的: 选择了多看一个钟头电视,就要从其他事情(比如睡觉、健身)里面扣掉一个钟头。大部分的人可以支配的财富也都是有限的,一年的工资就那么点,花了五千块买了新手机,就要在其他东西上面少花五千块。说得好听点叫做time management,priority management,personal finance, career planning。说穿了,全都是选择题。有得必有失。知道自己要什么,知道what makes me happy,也会有助于决定要得什么和失什么。

    有很多人有“选择障碍症”(传说天秤座的人里面“选择障碍症”比例比较高)。而我欣赏的态度是“做决定前不犹豫,做完决定不后悔”。况且,犹犹豫豫的人看上去很没有自信,而在学校里和公司里,缺乏自信会给自己扣很多分的。做选择是一种能力,能力是可以从小培养的。

  3. 了解自己行为的后果。

    我们希望郑轶嘉在做任何事情的时候都不是盲目的,而是清楚的知道自己的行为会导致的结果。他蹬椅子的时候我们会告诉他蹬椅子会摔跤。他如果再蹬,我们就让他摔,摔疼(但不摔伤,这还挺难把握的)。这样他就记住了。在跌倒的过程中学会行走,小孩的学习过程大概基本上就是这样的。

    我们不会阻止他吃辣酱、吃芥末,当他被辣到了,就记住了。而且重复多次以后,他会记住一个generalized的知识:凡是爸爸妈妈跟他说建议他不要吃的东西,那基本上的确是不怎么好吃的。他有时候衣服穿的少,我们会跟他说穿太少可能会着凉发烧。如果他还是坚持不穿,那就不穿好了,他真的发烧的时候就会记住这个教训了。同样的,我们家里是从来不追着他后面喂饭的。我们每顿饭都对他说的很清楚:这顿饭如果你不好好吃,可以的,但是下午饿了是没有饼干吃的,必须要等到晚饭。这样做的代价是他的身高在同龄人里面只是中等,但我们认为和身高相比,教会他自己行为的后果更重要。

    将来,我还希望他能清楚的明白一些更重大的后果,比如:不戴安全套是会把其他小姑娘的肚子搞大的,小孩如果生下来,会多多少少毁了他自己的人生的。又比如说,拍一堆裸照并存在自己的手机和电脑里会导致什么可能的结果。对于成年人来说,英语里的比较接近的说法就是”make educated decision”。

“Stand behind his own choice “意味着:

  1. 不后悔。

    世界上没有后悔药。生活不可能像打游戏那样save & load。后悔是没有用处的,不能改变任何东西。后悔只能增加自己的不快乐。后悔只能增加挫败感,并且把这种负能量传播给周围的人。

    与其后悔,不如对于自己做的决定保持密切的观察,观察实际情况,并进行必要的调整。另一方面,对于每一个重大决定,要学会尽量考虑周全,要做好风险管理(不要轻易all-in),还要考虑好plan B。这就像打麻将一样:虽然已经打算做清一色了,但先不要把其他花色里的对子打掉。万一打了一会儿发现自己坐了二等舱,还有机会可以换方向,换成碰碰胡。

  2. 言出必行。

    每一次吃自助餐,我们都让郑轶嘉自己去选自己要吃什么。在把菜夹到盘子里面之前我都会跟他再确认一下:“你确定要吃这个么?”以及“拿了就要吃掉的,你保证一定会吃掉?” 拿好菜坐到座位上开始吃以后,我们会督促他实践他刚才做出的承诺。有时候,一盘早餐会吃一个多小时,就是为了给他时间,让他自己做到自己答应的事情— 在我们看来,“浪费”这些时间是值得的。

    在我看来,无论是生活在一个寡廉鲜耻、笑贫不笑娼的社会,还是生活在一个讲究信用的社会,无论社会机制是否惩罚或奖励讲究信用的人,对我个人来说,出尔反尔是我最不能接受的人类品行之一。我不能接受我的孩子是这样的人。

  3. 承担自己行为的后果。

    “承担后果”在中文里面有一点贬义,如果是放在英文里面,“consequence”或“outcome”会稍微更中性一点。总的来说,无论结局是好是坏,都要自己承担,好的可以心安理得地放自己口袋,坏的要自己擦屁股。不要指望别人帮你擦屁股。自己的玩具要自己clean up,自己扔在地上的东西要自己捡起来。

    换一个说法,要愿赌服输。

我希望这样能把郑轶嘉培养成一个内心强大的人。我希望有一天当他选择了计算机作为自己的职业的时候,是因为他知道他能从这份职业中得到他所想得到的,并且他所想要得到的是能使他快乐的,而且他清楚的知道选择这份职业意味着会得到什么和失去什么。我希望他在将来的某一天,当他从这份职业中得到了之前他想要的东西的时候,他会快乐,哪怕那时候他的工作比那些当公务员的辛苦很多,但另一方面,钱又比做投行的少很多。

The Maximum Possible Size of A Meatball

A few years back, I read a ThinkWeek paper about the engineering waste in Microsoft. It presented such a picture: from bug management to test case management, from continuous integration to deployment automation, for every problem in software engineering, there are a handful different tools in Microsoft to solve the same problem. It did look awful. The author was advocating for a consolidation of engineering systems within Microsoft and I was in the same camp.

But since then, more and more I realized that Microsoft’s problem is far from being unique. It’s kind of very common in the software industry nowadays. For examples,

  • On the List of unit testing frameworks, it listed more then 40 different kinds of unit testing frameworks for JavaScript, 35 for Java, 64 for C++ and 28 for .NET;
  • On Comparison of file comparison tools, there are 24 different text file comparison tools and there are a couple others that are not on that list but I have used in the past;
  • When it comes to code review, the list is also long: Phabricator from Facebook, Gerrit from Google, Crucible, … For more, just search online.

At last, look at how many programming languages we have.

It’s inevitable that when more people use a tool, it becomes harder for that tool to meet all the needs, including some volatile/subjective measurements like “easy to use” and “good UI”, as well as meet the needs in time. When a tool gets more bloated as more people keep contributing to it, it becomes harder to learn the tool. When a common lib is used in more systems, the odds of a code change in the common lib breaking somebody becomes higher. The cost and difficulty to ensure no such regression grows worse than linearly. As the number of people/organizations who share with the same tool grows, it will get to a point where these costs, risks and difficulties outweigh the benefits of continuing to stick together. Then the only natural thing is to fall apart.

It’s just like making the meatballs. Small meatball stays as a ball for days. But as the meatball gets bigger, it becomes harder to stick together. When the size reaches a certain point, regardless how much flours you add and how hard and for how long you press the meat together, the big meatball will just fall apart as soon as you put it down. That’s the maximum possible size of a meatball.

A Parking Lot in Downtown San Francisco and the Free and Open Internet

In my experience, if a resource is 1) valuable, 2) has no fee, 3) is in limited supply and 4) open to use, inevitably it will get abused. Its available capacity and liquidity will sooner or later drop to near zero.

That’s just human nature. For example, if there is a parking lot (=valuable) in downtown San Francisco (=limited supply) which is free to park (=no fee) and doesn’t have any time limit such as “free to park up to 2 hours” (=open to use), inevitably it will be full all the time. People will come in the morning and leave their cars there for the whole day — why not? It’s free and no time limit. If I pull my car out to run a few errands, I’m afraid my slot will be taken immediately and I won’t find a slot when I come back. Instead, I better put a folded bicycle in the trunk. I can ride the bicycle around in downtown, so that I can park my car there throughout the day for free.

This actually happened for real in my workplace, for more than once. The pattern was:

  • We had a test cluster on which people can create virtual machines to run their testing (=valuable resource)
  • The test cluster was already paid for by our team budget, so everyone in the team can use it for free (=no fee)
  • The test cluster wasn’t big enough for everybody to use freely, nor could we increase the capacity (=limited supply)
  • In the beginning, we didn’t apply any policy (=open to use)

What happened very soon was that the test cluster was full and remained full. We found that people just didn’t delete their virtual machines after their testing was done. The behavior was self-enforced because the fewer people release their virtual machines, the harder to get a new virtual machine. Everybody suffered, though we repeatedly told everybody “please use it responsibly and be considerate to other’s need”.

At last, we set up the quota and time limit: each one can have up to N virtual machines and it requires manager approval to get more than that; each virtual machine will be automatically deleted after X days unless explicitly renewed. That solved the availability and liquidity problem and no one complained about not having a “free and open” test cluster.

This is why there won’t be a “free and open” Internet without fees or quota. This is why President Obama’s rules won’t help, if not make things worse. In his statement today the net neutrality, he put forward a few rules including:

  • No blocking
  • No throttling
  • No paid prioritization

Internet bandwidth is a valuable resource and in limited supply. To avoid what happened to the parking lot in downtown San Francisco and my team’s test cluster, we have to do one way or another: either set some quota (blocking/throttling), or charge some fees (paid prioritization). Valuable resource + limited supply + no fee + open to use = no one can use.