step 0: assume a malicious universe

Here’s a thought puzzle for you… given the following line of computer code, “what could go wrong?” That is, what kinds of issues could arise from submitting that code to your favorite programming language interpreter (you do have a favorite… right?)

n + 4

I often tell my university students, “step 0: assume a malicious universe“. Particularly in cybersecurity, but also in programming writ-large, any unverified assumption about the state of the world can be used against you—and eventually will be.

n + 4

seems so innocuous… but even here there’s a flood of paranoia that washes over a jaded programmer. Here are the things that crossed my mind in rough order from “the bottom up”—that is, starting from the hardware level, “up” through the OS level, “up” again to the language interpreter level, “up” again to your code evaluation environment, and finally “up” to the rendering level where your development environment decides how to display the underlying data. By no means do I believe this to be a complete list.

Infrastructure

  • While unlikely to happen in the nanosecond between when you press ENTER and see the result of your computation, electrical power could fail—I suppose that for a running computer, it’s always the nanosecond between one computation and the next!
  • The computer you’re using might have a flaw in its math processing circuits (e.g., https://en.wikipedia.org/wiki/Pentium_FDIV_bug)
  • The computer may work fine on the Earth’s surface, but when you decide to send it on orbit or use it near a nuclear reactor, ionizing radiation could cause random changes in bits (0 to 1, 1 to 0), so you’d need to radiation harden your system (see https://en.wikipedia.org/wiki/Radiation_hardening)

Operating System

  • The computer you’re using might have other, unrelated code running that prevents your code from accessing the CPU (AKA a denial of service attack)
  • The computer you’re using might have other, unrelated code running that alters the values your code tries to send to or receive from the CPU or main memory (an in-memory attack, such as a virus)
  • The computer you’re using might have other, related code running that alters the values your code relies upon—this is a problem with data that are mutable (changeable) and are shared by at least two concurrently running processes (or threads or microservices or…); many possible mitigations have been devised over the years: semaphores, locks, and broadly thread safety, for example
  • The computer you’re using AND any other computer in the world might each have related code running that alters the values your code relies upon—think two people trying to book the same concert seat at the same instant

Language Interpreter

  • The language interpreter you’re trying to use may not be installed on your computer
  • The language interpreter you’re trying to use may not recognize the infix syntax of n + 4 (for example, Lisp; HT Brian Danielak)
  • There might be a defect in the language interpreter you’re trying to use that mishandles any aspect required to compute n + 4
  • The language interpreter you’re using may not recognize the infix + operator

Code Evaluation Environment

  • The language interpreter you’re using may NORMALLY recognize infix +, but another programmer may have overridden the definition of + (an old trick played by many a CS student on peers)
  • The character that appears to be the English lowercase letter “n” (or the + or the 4 or even what appear to be spaces on either side of the infix +) may, in fact, be some different character—this is known as a homoglyph attack (see https://www.irongeek.com/homoglyph-attack-generator.php):  Apparently the characters W I L L I A M D O A N ENone of those letters are the standard, ASCII, letters they appear to be
  • n may be undeclared (there’s no previous mention of n in your code)
  • n may be declared, but it may have an arbitrary value—this happens in some languages where declaring a variable allocates memory to store the value, but doesn’t automatically initialize that memory address with a known value, leaving you with whatever set of bits were last left at that memory address by other programs
  • n may be declared, but it might be NULL or NIL, an explicit “no known value” value provided in some languages
  • n may be declared and defined to have a specific value, but it might not be a numeric data type (n = “bob”) which could lead to issues with automatic type conversion
  • n may be a declared, defined, numeric data type, but could represent something other than an integer, for example it could be a complex number (e.g., 2 + 4i) which would produce the correct mathematical result (6 + 4i), but you might not be expecting a complex number as the result
  • n might be a defined integer so large that adding 4 to it overflows the computer’s ability to represent the value, which would generally result either in an explicit error or a silent “wrap around” of the value of n from the most positive value able to be represented by the computer (let’s say 2^63 – 1, the largest integer representable as a signed 64-bit binary integer) to the most negative value (-2^63)—this is known as an overflow error (see https://en.wikipedia.org/wiki/Integer_overflow). An underflow error occurs when the wrap around is from the low end of representable values to the high end.
  • n might be such a small number (e.g., 0.0000000000000001) that adding it to the integer 4 counterintuitively returns the integer value 4—that is, 4 plus some very small number beyond the computer’s precision equals 4
  • n explicitly may be not a number (NaN)—some programming languages (javascript, R) support this as a distinct value—what’s NaN + 4?
  • n explicitly may be +infinity or —infinity (Inf in R)— what’s infinity + 4?
  • The largest integer and the smallest integer representable on your computer may be different than the 64-bit limits I gave: -2^63…2^63 – 1

Rendering Issues

  • The result may be correct in the computer’s memory, but when output (to screen or paper) might be so large that the language interpreter or development environment you’re using “conveniently” represents the value in an alternate representation, such as scientific notation, thus masking the digits to the far right: 5.1 x 10^10 (often rendered in computing as 5.1e+10) could represent 51234567890 or 51234567891 or 51234567892 or 51234567893….. or even 51467890123