Part 1: Terminology
- Understanding how software typically breaks is a requirement for successful secure design.
- A way software can break is known as a weakness. A single security issue is known as a vulnerability. The weakness is exploited via one or more vulnerabilities. (Often the necessary software and activities required is called an exploit.) When you fix a security issue, you fix a vulnerability or you redesign the system not to exhibit the weakness.
- In practice, only exploitable vulnerabilities matter. There can be weaknesses and vulnerabilities that cannot be exploited. Hence, the existence of a vulnerability is not always a proof of exploitability.
- The act of exploitation is one type of an attack. Note that an “attack” does not imply a malicious intent: If I am testing a system legally, I am still attacking it.
Discussion: Given the attacker, who is the defender?
- A security risk can exist whether or not there is a weakness. A risk is a probability (likelihood) of a system having a weakness, and the impact of someone exploiting a vulnerability.
- In many cases, the choice of whether or not to treat a risk boils down to the loss expectancy of the risk, compared to the cost of treating the risk. The tricky part here is to be able to quantify the likelihood of the risk event, which in security is usually much harder than quantifying the impact.
- A threat is an actor that may exploit a vulnerability (or an event that someone does it). The terminology of a threat varies a bit according to source. If you want to be specific and talk about the person, talk about threat actors or something like that.
- Weaknesses are catalogued by CWE. Vulnerabilities are catalogued by CVE.
- Vulnerabilities are security bugs or flaws. These can be implementation-level, design-level, or even business-case level, or span several levels.
- The way to deliver the exploit is an attack vector. The more exposure the system has, the larger attack surface it has. Usually, if a system is complex, it has more attack vector possibilities and a larger attack surface. We will get into these concepts more closely in sessions 4 and 5.
Discussion: How could adding a security feature make a system less secure?
- The difference between an architecture-level and implementation-level issue is fluid.
Discussion: If your WordPress blog is hacked, is it WordPress’ bug, or was it your fault because you made the architectural decision to use WordPress?
Part 2: A very quick demo of binary exploitation
Discussion The purpose of this demo is to show how data ends up as executable code. This is probably the lowest level we go in terms of technology on this course.
- One of the weakness categories is where an attacker gets arbitrary code to run in your system as your process.
- This is a particularly interesting category of weaknesses, because it has a possibility of delivering very devastating attacks.
- We will quickly remind ourselves how operating systems work:
- Program code is loaded into memory for execution. It is executed as a process. Modern operating systems ensure that processes cannot access each others’ memory areas - except shared memory.
- A process has several sections of memory. The executable binary code resides in the ‘text section’. The other interesting areas are the data storage areas of heap and stack. If the process has linked libraries (DLLs, SOs), the code in those libraries is also regarded as a part of the process. (This has meaning for risk analysis; plugins that are DLLs/SOs get executed with your process privileges.)
- The following picture shows an example process layout widely shown in literature. To see a real one on Windows, use VMMap (sysinternals), on Linux,
A simplified image of a process memory layout
- The processor has a register called the Instruction Pointer (which we denote by IP; do not confuse with Internet Protocol). This points to the memory address of the binary opcode to be executed by the processor.
- When the binary performs a jump to a function, the operating system places the call parameters on the stack, followed by the current IP. When the function ends in a return opcode, the previous IP value (return address, below) is popped from the stack, and execution continues from there.
- Also the location of the previous stack frame (“frame pointer”) is stored, and after that, memory is allocated for any local automatic variables declared in the function we just entered.
- The stack is a collection of these stack frames. The current function is at the top of the stack (i.e., the bottommost frame in the picture).
A more detailed look at a stack frame
Demo: We explain this all by conducting simple stack-based old-school buffer overflow exploitation on Linux. For the demo, we are using a legacy 32-bit Linux with OS’ anti-exploitation mechanisms turned off.
Note: This also immediately leads to discussion of how exploitable vulnerabilities can be made non-exploitable through other design strategies, and again the cat-and-mouse game of how those design strategies get bypassed.
- It may be possible for an attacker to overwrite the value of the IP stored on the stack, and thus point the processor to code selected by the attacker. We will use this ‘old-school’ technique to explain what is happening.
- Overwriting can also impact other local variables. Integrity of those variables can change how the process executes, so buffer overflows may have security impact even without arbitrary code exection.
- The canonical example is a buffer overflow resulting from careless memory copy operation. One such can be strcpy() in C. The issue is that one has to be really careful when using these functions, because it is very easy to copy too many bytes. If this happens, the overflowing bytes will overwrite something in the memory located after the target memory area.
- An attacker could feed an input to the system where a programming error ends up copying the attacker-controlled input over something else. If this ‘something else’ happens to be a stack frame allocated from the stack, the attacker may be able to overwrite the IP value stored on the stack frame. Now, when the processor hits a return opcode, the processor continues execution from the IP value that the attacker overwrote the real one with.
- The next step for the attacker is to provide some attacker-controlled code. In the easiest case, the data that overwrote the IP also contains this code, and it gets written to the stack. The attacker will then choose an IP value to overwrite the return value on the stack that will cause a jump to the attacker’s code.
- The remaining challenge with this old-school method is that we don’t really know the memory address our exploit ends up in. Stuff like environment variables and the command name shift our code back and forth from one environment to another.
- Hence, we need to do slight trickery, and use a method called a ‘trampoline’. Here, after the overflow and when returning from the function, the stack pointer (which points at the top of the stack) actually points to our exploit code. If we now overwrite the return address with an address that points to the text segment, into a place that translates into a jump into wherever the stack pointer is pointing (“JMP ESP”), we can always start executing our payload irrespective of the specific address it ended up in.
- Another way to make the exploit more reliable is to pad it with “NOP sleds”, series of “No Operation” opcodes, and try to get the IP to land somewhere inside the NOP sled.
Stack frames. A normal stack frame on the left, exploit code below the return address in the center, and exploit code above the return address on the right.
- This fairly simple (by modern standards) attack is becoming harder. There are various architectural aspects that contribute:
- Stack and heap areas should not contain executable code. On modern processors, it is possible to define these memory areas as “non-executable” (NX, XN (“eXecute Never”), XD (“eXecute Disable”). The feature is enabled by DEP (“Data Execution Prevention”) on Windows. If the processor is told to jump to stack, and start executing attacker code there, the process will be terminated.
- So called stack canaries or stack cookies are values that are written on the stack with (i.e., before) the return address, and their integrity is checked when returning. If an attacker mistakenly overwrites a stack canary, the process is terminated.
- NX can be countered by the attacker through Return Oriented Programming (ROP). In this model, there is no executable code in a non-executable memory section. Instead, the attacker overwrites the stack with a series of return addresses, each jumping back to the main binary or one of the loaded dynamic libraries (hence sometimes also known as “Return-to-libc”).
- The attacker will examine the target binary to find short snippets of code, called “gadgets”, and by chaining them, is able to perform something - for example, to set up parameters for a system() call and then call it. These gadgets can be found, and even exploits built, by automated tools.
- ROP, in turn, can be countered by randomising the memory address where binaries are loaded. This way, the addresses to jump to, including those of gadgets, change from invocation to another, and it is difficult to write an exploit that reliably chains them, especially across systems. This technique is known as Address Space Layout Randomization (ASLR).
- ASLR can be implemented in various ways. Windows executables that are linked with a specific linker option are patched at runtime so that they can run from different memory addresses. On Linux, ASLR-enabled binaries are compiled into PIE (Position Independent Executable) binaries that can be run from an arbitrary memory location. Dynamic libraries (i.e., libraries that are linked dynamically at runtime) are often prelinked (library relocation is precalculated) for performance reasons, and the relocation of libraries can be randomised in prelinking as well.
- ASLR may have a performance penalty (e.g., with PIE on 32-bit x86 systems) or in embedded systems where libraries never change and prelinking is only done once at firmware creation stage, these strategies may not be good enough. ASLR should not be seen as a silver bullet solution (like nothing else really).
NX/XN, stack canaries, and ASLR are not necessarily universally in use in all targets. For example, on a certain platform, shared libraries that were not built to allow NX would also force their parent process not to have NX. Stack canaries are implemented by code in function prologue/epilogue and if not compiled in, they may be missing. ASLR might suffer from bad entropy or only partial randomisation (leading to predictability) or it might not be used for some binaries, or perhaps the underlying kernel doesn’t use ASLR. The attacker could, therefore, find modules or other executables reachable from your attack surface that just lack these protection methods.
Note: There is a lot to be learned in binary exploitation. I am not the right person to learn binary exploitation from, and this is not an exploitation course but instead a software security course, so this is the extent to which we go. But if you are into this sort of bit-twiddling and can live with Windows, I can personally recommend the commercial Corelan trainings (e.g., the Corelan Bootcamp), especially if you get someone to pay for it.
Part 3: Discovering input processing problems with a fuzzer
Note: Fuzz testing, discussed below, is the topic of this week’s exercise. If we have extra time, we may have a demo of the tools.
- Ok, as a defender, you’re going to build your binaries with ASLR, NX, and stack canaries. However, those really are the last line of defense; your program has already crashed and it might not be running on your nice and patched box. So, you need to ensure that you don’t even crash. So you need to test for input processing problems.
- There are various input processing problems, to name a few:
- Overflow: A copy operation overflows the intended buffer and overwrites bytes beyond the buffer. These can be stack or heap based.
- Format string attack: Many functions use format strings that dictate where variables are to be placed (like %s for a string). Putting format string characters in input may cause the program to write out contents of variables into the string.
- Out-of-bounds read: A read operation reads memory beyond the intended memory area.
- Use-after-free: A memory location is referenced (read or written) after being freed.
- State machine issues: The system accepts wrong inputs in a given state, or a wrong input causes a state change that should not happen.
- Privilege elevation for inputs: The system acts on the inputs that it shouldn’t be acting on, because the input comes from a party who doesn’t have the authority to do that. SQL injection is one example: The system executes SQL statements supplied by the user.
- Denial of Service: The system stops processing (or timely processing) due to an input. This may be a direct result of the input or a subsequent state due to a state machine problem.
- These problems can be detected in many ways. The two main categories are:
- Static analysis: Program code is inspected “statically”, that is, when it is not executing. This can be manual, basically reading code, or helped by a tool. Static analysis ranges from simple pattern matching to syntax checking (“linters”), and the most advanced ones trace execution flows and branches. (Lecture 3 will have more on static analysis tools.)
- Flaw injection: Incorrect inputs are fed to the program, and its execution is observed. Contrasting with static analysis, flaw injection is dynamic testing. Observing the target can be done, for example, by running the target in a debugger or a dynamic test harness, or just by monitoring the process and its CPU usage.
- Here, we will discuss flaw injection that is intended to discover input processing issues. Getting back to the definition of secure software - “software is secure if it does what it is supposed to do, and fails gracefully if it cannot” - this testing is often called robustness testing. A specific subclass of this is fuzz testing or fuzzing.
- Again, within fuzz testing, there are a variety of strategies:
- Generation-based or model-based fuzzing, where the syntax of the input is modelled through its specification, typically BNF (Backus-Naur Form or Backus Normal Form), ASN.1 (Abstract Syntax Notation) or an XML (Extensible Markup Language) doctype. Then one or more of the fields or parts of the syntax are manipulated. Model-based fuzzing has large and guaranteed input space coverage with a smallish number of test cases, but requires heavy preparatory work.
- Mutation-based or sample-based fuzzing takes a number of valid data types and breaks them in various ways (bit flipping, truncation, repetition, shuffling, replacement, etc). This can be automated to a great degree and does not require a specification (only samples). The input coverage is not guaranteed.
- Instrumented, or synthetic corpora fuzzing looks at the executing target, determines which changes in the input cause new code paths to be activated, and then changes the input accordingly. This type of fuzzing can be made more effective by feeding it with sample-based strategies.
- Blind fuzzing applies the transformations to inputs without any samples or knowledge of the input. It is usually the least effective way, but may be the only solution if we have unique or one-off inputs that cannot be replayed to the target system.
- Once a fuzzed input is fed to the system, the system is observed. The expected, “good” way to process the corrupted input is to drop the input.
Discussion: Why is silently dropping a bad input often the best way to handle broken inputs? When will this strategy fail?
Discussion: Let’s assume we are fuzzing an image viewer. Some of the test cases cause garbled images to be shown, but the program works otherwise just fine. Is this a bug, or indication of a security problem?
Discussion: You are blindly fuzzing a docx Windows Word document. If I tell you that it is a bundle of zipped xml files, exactly which functionality are you actually testing?
- If the system exhibits a bad response to the fuzzed input, it is usually a crash (process terminates), or a hang (process goes to an undefined state). The crashes can be analysed further to determine whether the crash could be exploitable.
- In its simplest form, observation of the system can be done by watching the processes and their CPU usage to catch crashes and hangs. Running the target under a debugger, such as gdb, has the benefit that you can get a stack trace from the crash and use that for classification of the issue (see next part). Test harnesses, such as Valgrind and AddressSanitizer, can be used to detect memory errors.
- In defensive systems engineering, all inputs that cause undefined systemic responses are bugs / flaws.
Discussion: Is the famous Windows Blue Screen of Death a controlled failure?
Discussion: If the developers claim that the crash is not a bug because the input is not supposed to exist, or will never happen, is this claim acceptable?
Discussion: If the developers then claim that the crash has no security relevance because the process is sandboxed and has no availability requirements, would you still consider it a security issue? Which questions would you ask to determine the remaining security-relevant exposure?
Part 4 (Optional): Advanced Fuzzing Topics
Automated exploitability analysis
- Automated exploitability analysis
- NOTE: Automated exploitability analysis with a triage tool, below, is extra material and not demonstrated during the lecture.
- Crashes can be analysed for exploitability. This is necessary when there are too many crashes to fix them all. This is used for prioritisation of fixing work, known as triage.
- Exploitability analysis uses a set of heuristics, including
- Did the program crash after a return instruction?
- Did the program try to use an address on the heap that was already freed?
- Did the program crash on a branch instruction?
- …and so on.
- Exploitability tools:
gdbexploitable plugin on Linux
- CrashWrangler on Mac
- Exploitability tools necessarily only provide a lightweight analysis of the crash. The fact that an exploitability tool determines that a crash is exploitable does not mean it actually is; and a “non-exploitable” crash could easily have systemic effects which make it useful in an attack.
- Exploitability tools also calculate an unique hash of the call stack. This can be used to determine which of the crashes are duplicates (i.e., share the same root cause: the process goes through the same function calls just before the crash).
Discussion: What sort of situation would even warrant the need for automated exploitability analysis? From the attacker’s perspective? From a company’s security team perspective?
- Corpus distillation
- Corpus distillation is an optional topic, and likely not be covered during the lecture.
- A way to increase the effectiveness of sample-based fuzzing (see earlier in the material) is to select a very good set of positive, valid samples (the corpus).
- Corpus distillation may increase effectiveness if your fuzz runs are bounded by time. If you can conduct large scale fuzz testing over an arbitrarily long period (e.g., in a Continuous Integration system), it may not be cost-effective.
- Corpus distillation is a process where a large set of valid test cases is reduced into a set that has the potential to maximally exercise the target system.
- A way to do corpus distillation is to first obtain a large number (say, tens of thousands) of valid cases; run them through a target system with coverage analysis instrumentation; and then select those samples that cause the most differing coverage, with the hypothesis that those exercise the most different decision flows, and fuzzing those has most probability to hit interesting things. Ideas on how to do this include:
- Checking which ‘basic blocks’ (unique blocks of code between branch instructions) are being executed. Different sets of basic blocks indicate different execution flows. As an example, Valgrind nicely provides this data.
- Checking function call graphs. This is on a higher level but can still yield differences between the runs.
- Interestingly enough, you may not need to do the corpus distillation analysis with the target system, but perhaps with another system that processes the same kind of data, and still get good results. For example, if you aim to fuzz an image viewer on a mobile phone, but cannot instrument the viewer on the phone, you could do corpus distillation using an image viewer on your desktop computer.
- You can also synthesise the corpus through instrumenting the target and then trying out which inputs cause new code coverage. Here, you would not do distillation of existing valid cases but instead synthesise an invalid case. Currently, the American Fuzzy Lop is probably best-of-breed in this category of fuzzer tools.
This is a list of useful documents that will enhance your understanding of the course material. The reading list is session-by-session. “Primary” material means something you would be expected to read if you are serious about the course, and may help you to do the weekly exercise; “Additional” material you may want to read if you would like to deepen your understanding on a specific area.
- Have a look at the CWE (Common Weakness Enumeration) project. It classifies different types of security failures - an example would be https://cwe.mitre.org/data/definitions/118.html. Click on the “Research View”, and you see a list of all the weaknesses catalogued by CWE.
- Have a look at the National Vulnerability Database. Each publicly known vulnerability gets a CVE (Common Vulnerabilities and Exposures) number that can be used as a shorthand to refer to the issue (“Have you already fixed CVE-XXXX-YYYY?”). As an example, see https://web.nvd.nist.gov/view/vuln/detail?vulnId=CVE-2011-1472. You will see an explanation of the issue, affected systems, the CWE classification of the problem (see previous bullet) and a CVSS (Common Vulnerability Scoring System) score, which is intended to convey how serious the issue is. Almost all security reports and release notes use CVE numbers nowadays.
Note: Yes, much of this stuff is old, to the tune of 10 to 20 years. The basics are that old, but you can still feel pretty contemporary when you think how old your introductory Physics class stuff is!
- If you want to get more technical details on the mitigations and bypass techniques, here are three summaries.
- Aleph One (Elias Levy): Smashing the Stack for Fun and Profit. Phrack Magazine, vol. 7, issue 49. This article, from 1996, is the seminal work on the “old school” exploitation of buffer overflows, like the one demo shown during the lecture. It is not purely of historical interest - it is a very readable and concise explanation of what is happening in a system which doesn’t have modern protection measures.
- Theo de Raadt: Exploit Mitigation Techniques. This presentation from 2005 describes the anti-exploitation techniques as implemented in OpenBSD (and currently in most other general-use OSes). Explains stack canaries, non-executable data areas, and address space layout randomization, with a lots of pictures of what they mean. Typeset in glorious Comic Sans.
- Buchanan et al.: Return-Oriented Programming: Exploitation without Code Injection. This 2008 presentation describes the generalisation of return-to-libc attacks into ROP. It gives a view of exploitation techniques in operating systems that implement the mitigation systems above.
- Here are a couple of writings for you having interest in fuzzing.
- Ormandy et al.: Fuzzing at scale and the presentation that is linked from this blog post. This 2011 blog post by Google’s security team gives insight to (once) real-life effective fuzzing technologies. They use sample-based fuzzing but they choose set of valid samples carefully (“corpus distillation”).
- American Fuzzy Lop, the synthetic-corpus, instrumenting fuzzer, has a page at http://lcamtuf.coredump.cx/afl/.
This is lecture support material for the course on Software Security held at Aalto University, Spring 2018. It is not intended as standalone study material.