Page tree
Skip to end of metadata
Go to start of metadata

Framing the topic, business level analysis

  • Vocabulary alert
    • Threat and risk are often used interchangeably; threat modeling and (security) risk analysis often mean the same, or similar, thing. This is not really true, but just how they’re used.
      • Threat is the threat actor, a human.
      • Risk is the probability (likelihood) of a bad thing happening, multiplied by its impact.
    • This means that ”threat modelling” really might not actually be ”threat” modelling but instead risk modelling. However, the term ”threat modelling” is widely used and understood, so instead of trying to change the world, I’m going with it.
  • You can have a risk, or a threat, even though you do not necessarily know you have a weakness (and a resulting vulnerability)
  • ”Someone will inject into our database” can be a valid risk even though we don’t know whether there is an injection problem or that whether we have a plausible threat actor. However, if you have a basis to argue that you do not have such weaknesses, or that nobody is interested in your database, the risk is likely to be small.
  • Here, we look primarily at technical analysis. There can be many other levels, and some will refer to quite high level business environment threat analysis as security risk analysis.
    • For example, whether there will be a new paradigm of device use (such as ”Bring Your Own Device” was) is a valid discussion… but not necessarily for the immediate technical analysis of a system.
  • Here, during the next two lectures, our security risk analysis will assume:
    • We have a specific technical system or solution that we want to analyse
    • We actually have a clue about its architecture and design (or we’re going to design it very soon)
  • Even so, there are two levels of risk analysis:
    • Business level, that is mainly about risks to the value creation
    • Technical (or architectural / design) level
  • Business level security risk analysis
    • Business level analysis greatly helps to frame the technical / design level security risk analysis if the actual business value creation logic is clear:
      • Some things that may be technically security risks might not be business risks at all - hence mitigation may not be necessary.
      • Some things that an attacker could aim at may not be evident from the architecture alone.
    • Business level analysis is a key part of Privacy Impact Assessment (PIA), which we will tackle in Session 5.
    • Business level security analysis should be done with the customer representative. In many organisations, a direct customer contact is not available. In this case, a ”Product Owner” or ”Product Manager” is typically one person you want to talk to.
    • A business level security risk analysis should:
      • Identify information flows and assets, as an input to the technical analysis, later. There may be some information flows that the business folks know about and that aren’t visible in your component but have an overall security effects, such as usage analytics data. It is also very useful to pinpoint personal data at this point.
      • Identify main use cases (user stories) and the misuse cases - what are the worries that business has. You could approach this by identifying the role of this system in the business value chain. What sort of issues in this system would affect the value stream (or actual revenue stream)?
      • Identify the users (including administrators) of the system, and extend this to all the other people who sell, service, or support the product. What sort of view do they have to the system? What kind of information do they receive? What sort of management interfaces do they have towards the system?
      • Identify whether the business has some specific regulations, customer requirements, or certifications that need to be met, and whether those have any software security impact.
    • As it can be seen, a business level security risk analysis does involve business people (product owners ans service designers, see lecture 3).
    • The main takeaways that will help you later, are:
      • What is the system used for (business wise)
      • Information asset / flow list
      • List of persons / roles interacting with the system
      • What is the ”top list” of things that should not happen (i.e., misuse cases or “attacker stories”)
  • Visualising With Threat Trees / Attack Trees
    • Many sources propose visualising threats with ”threat trees”.
      • Example: Shostack: Threat Modelling, Chapter 4 (see reading list).
    • However, most examples of those threat trees in the literature have significant flaws:
      • They are too simple and clean, tree-like (systems normally aren’t but issues have multiple causes and effects)
      • An unstructured discussion may not guarantee any specific coverage for analysis
      • They do not readily support one of the most critical processes of risk analysis, which is resolving any ambiguities and underlying assumptions in actual design
    • Threat trees can be useful for noting down the business-level threats from the business-level analysis, but my suggestion is not to try to use them for technical analysis
    • Real threat trees would look more like threat graphs. The most interesting issues would be probably at the graph cliques.

Demo: An actual threat graph from a real-life case

Threat modeling on technical design level

  • What sort of information you need to know about your system
    • Data processing
      • As we’ve seen from earlier sessions, processing data can go terribly wrong. And if processing breaks, the attacker may gain control of the processing entity (e.g., a process). It is useful to know exactly where in the architecture these potentially attacker-controllable blocks reside.
      • An attacker observing the state of data processing may break security assumptions. As an example, a process could have data unencrypted in memory. Knowing where processing takes place allows you to design protection.
    • Data in motion, data at rest
      • Data requires security services. Depending on the data, it might need, for example, confidentiality and integrity. How these can be offered depends on where the data is stored or being transferred.
      • Data ”at rest” is actually in motion - but in time. It is travelling towards the future. (This is less crazy that it sounds. The environment can change as a function of time, so you might want to think about that!)
      • Of particular interest are cryptographic keys. These are data, but very specific and interesting type of data.
      • Configuration files are also data.
    • Interfaces
      • An attacker usually needs to interact with the system. This means that there is a data flow to/from the attacker. The place where the data flow crosses a system boundary is an interface.
      • Interfaces are natural places to conduct robustness or other types of testing.
      • The total of all interfaces is the attack surface.
  • The process of resolving ambiguities and assumptions
    • In most real-life situations, exactly what a system does may not be entirely clear.
    • For real-life software development, software developers necessarily speak in, and use, abstractions. You don’t really go through the whole HTTP stack when you define a new REST API.
    • Abstractions are necessary for effective work, but they can also mask wrong assumptions. Have you ever seen an architectural diagram that has a black box? Do you know what the black box really does? Does everyone around the table agree?
    • In security analysis, opening these black boxes and broken assumptions is very important.
    • In all its simplicity, it is useful to ask ”how” and ”what exactly” many times, until you arrive on a level where security analysis is meaningful.
    • What’s a suitable level? When have we resolved ambiguity well enough?
  • Is this enough?

Really not a very good view of a web server

  • How about this?

A much better internal view of the web server

Discuss: Apart from the fact that Microsoft PowerPoint clearly is the wrong tool to do architectural risk analysis diagrams (this is why we do it on a whiteboard instead), even a simple assumption of a “web server” can turn up to be a can of worms. The picture above is pretty bad; it doesn’t, for example, show data flows properly and only does a first stab at opening up the complexity of the processing blocks of the system. What is missing? How would we drill down to it more?

  • The end result should be able to explain system in sufficient detail, meaning:
    • All data processing blocks have been divided down to process level.
    • Libraries that are statically linked have been identified.
    • Dynamic libraries and plug-ins have been listed and identified.
    • We know what frameworks, virtual machines and other underlying technologies are being used, and in which languages stuff has been implemented in.
    • Location of data stores is known - for example, where in a file system, or in which database, under which database user.
    • Every data flow has the whole protocol stack under it clarified. Is it HTTP? Does it have TLS?
    • Every protocol step is clear. In the web world, this is usually HTTP request/response, but there may be more complex ones.
    • We know exactly what data is being stored or processed. What’s the content, format and encoding? What data came from outside the system (could be attacker controlled)?
    • We know where configurations, keys and certificates are stored.
    • We know where personal data (i.e., privately identifiable data) or otherwise sensitive data (e.g., credit card numbers) are stored.
  • You do not need to do all this up front. There can be other strategies. For example, you could do analysis one use case (”user story”) at a time, or whenever a new feature appears on the product backlog or requirements list.

A picture is a thousand words

  • Design level threat modelling is really, really helped by a picture.
    • Humans usually remember 5 +/- 2 concepts at a time. Most systems have more parts than this. A picture helps.
    • Resolving ambiguities is very effective if picture is drawn while others comment. Usually, one can’t draw a detailed picture unless the ambiguities are removed.
    • I don’t personally like pre-drawn pictures that a lone architect has produced; these contain assumptions and ambiguities that the one person has. So I’d usually not use pre-drawn pictures even if they were available.
    • I’d rather draw the picture on a whiteboard and then (if an architectural picture is needed for something else) copy it off there than vice versa.
    • Do not worry about notation that much. Especially, forget trying to do clean UML.
    • A whiteboard works better than a drawing tool. This is why all the examples I’ve drawn below are drawn with an actual pen on actual paper. I also recommend you to do the same, because you will use a lot of time cursing at your diagramming software that you would better use doing analysis.
  • The most useful things to draw
    • Data Flow Diagram (DFD)
      • Shows processing blocks and data flows between them.
    • Message Sequence Chart (MSC)
      • Shows message traffic between communicating entities
      • Useful especially for more complex protocols (complex = more than two parties, or more than two messages)
  • One way to work is to draw a DFD and if the number of communicating peers, or number of messages, is >2, then draw an MSC
  • Usually NOT VERY useful things to draw:
    • Class diagrams
      • Classes may encapsulate data, but we’re more interested in run-time encapsulation (i.e., processes).
      • Sometimes it is useful to know about classes - for example, when objects are serialised into data, or when state is being maintained by a singleton object. But rarely worth drawing.
    • How source code is organised
      • The only exception here is that it is useful to know whether certain code comes from a third party
      • Analysis is performed against a running instance of a system, not how it looks like in version control
    • UML Use Case Diagrams
      • Mostly information that can be much better explained using plain text
      • Kids, just don’t do it
    • Details of ”neighbouring” components
      • You need to stop somewhere, unless you want to model the whole Internet
      • You can stop at the first processing block that you do not control (develop), and leave that as a black box. For example, if you have a pure server side web app, the browser is a black box. (If you have client-side JavaScript, it is not; then you need to model it.)
      • The data flow to/from that block still needs to be fleshed out in detail
      • All data flows in your DFD must have endpoints, they cannot end in a vacuum
  • Actors are important to draw
    • Sometimes, a data flow stops at a human being. These should be drawn. Your attacker is human.
    • Humans to be drawn include
      • Users
      • Admins (are humans too)
    • Any human could be an attacker, so a source of attacker-controlled data
  • Things people usually forget to draw
    • Admins and admin interfaces
    • Key storage (including SSH keys, etc.)
    • Configuration files
    • Load balancers (even if they would terminate TLS!)
    • Client-side code execution in web apps
    • Data flows that exist during system deployment, but not at operational time
    • In MSC diagrams, the requests that lead to responses, and redirects
  • The security boundary concept
    • ”Boxes” in your DFD should be security boundaries.
    • A ”security boundary” is some sort of barrier between two processing blocks that is (externally) enforced upon these blocks.
    • A process is within a security boundary, because (modern) operating systems ensure that processes cannot read or write each other’s memory. A memory protection boundary becomes a security boundary.
  • However, threads within a process do not have their own security boundaries.
    • A virtual machine is a security boundary, because another VM cannot magically jump into another VM.
    • Security boundaries form concentric layers. A Java app runs in a JVM process that can be inside a Docker container, which is inside a VM, which is inside a physical box…
  • Beware of fake boundaries
    • Just deciding to store stuff in different directories is not a boundary, unless directory access controls are enforced from outside (by OS/file system)
    • Usually, processes owned by the same user are really in the same domain from file system perspective (because a user can control all processes of that user)
  • What an attacker control of a security boundary means
    • An attacker that can control execution within a security boundary is usually thought of being able to control everything that happens within the boundary, and have access to all data within the boundary
    • Although this could sometimes be tricky, the safe assumption is to assume this
    • Example: If an attacker can execute code within a kernel, the attacker is thought to completely control the operating system - including data flows through it – including all security boundaries within the OS
    • Example: If an attacker can execute code on a physical host, the attacker is thought to completely control everything in all VMs running on that host
    • Counterexample: A smart card can provide a secure execution environment even if it’s i/o is completely controlled by a compromised system around it. However, unless a component has been designed as a secure execution or storage environment, this doesn’t apply
  • Security boundaries as an optimisation for analysis
    • In some cases, it is useful to treat a security boundary as a blob. If you define everything within it is untrusted, you might not need to analyse its internals at all.
    • You could approach a security boundary as an abstract object, and for example, sandbox (isolate) a security domain with a lot of untrusted activity. Bad things can happen inside the box, but hopefully they can’t escape.
    • Many beginners make the mistake of trying to fix problems within an untrusted environment between components that are in the same security domain. It often happens that this results in a series of minimal security enhancements that make exploitation a bit harder, but still possible. Unfortunately, this sort of work may end up generating significant costs with very little benefit.
    • Example:(e.g., OS compromised => all apps within it compromised; apps cannot protect from threats in their “enclosing” security domain no matter what sort of obfuscation etc. they would do)
    • An example of embedded security domains (explanation during lecture):
    Security domains within other domains
  • Typical boundaries
    • Machine boundary
      • A physical machine boundary, or a virtual machine boundary, is usually a rather strong isolation. Modern operating systems (if kept up-to-date and properly hardened) are usually pretty good in isolating stuff that happens outside them.
      • The problem with a machine boundary is, of course, that an attacker can attack anything that runs within the machine. All applications and services enlarge the attack surface, and once the machine is ”owned” (compromised), everything within that boundary is compromised.
    • Containers (e.g., created using Mandatory Access Control)
      • There are many types of containers that are usually run on top of an operating system. The Java VM is one example. Mandatory Access Control (MAC) frameworks such as AppArmor or SELinux and systems that utilise them such as Docker are other examples.
      • Containers can be isolated execution environments, or they can just mediate accesses to resources.
      • Example: Java VM and browser’s JavaScript engines are execution environments.
      • Example: AppArmor is an example of a system where the operating system checks whether a process has a right to do specific things; it’s not a separate execution environment but how the process interacts with other parts of the OS are checked and enforced.
    • Processes
      • The process is the most common unit of isolation - modern operating systems do not let processes to alter the execution of other processes (that they do not somehow own).
      • However, most operating systems are pretty porous with regard to processes’ rights. A process has many channels through which it could have an impact on the other processes, or the container it is running in.
  • Taint analysis
    • On ”tainting” inputs, please refer to Session 2. We are now extending the concept of tainting from code to communication protocols.
  • Refresh: Protocol stacks
    • protocol stack is a key concept of protocol engineering. Protocols are stacked on top of each other, and each layer is responsible for some aspect of the protocol.
    • An example being a typical web page load, where IP is responsible for getting its payload routed through the Internet; TCP creates a ”stream” of octets by combining several IP packets; TLS on top of TCP provides confidentiality and integrity protection services; HTTP on top of TLS provides the web page request and response, including the web page itself; and the web application may then have an application level protocol that uses HTTP to exchange, for example, JSON objects.
    • There are protocol layers below IP, too. However, IP is the one layer that is end-to-end.
  • Protocol layer termination and passthrough
    • When doing architectural risk analysis, it is necessary to understand where each protocol layer terminates.
    • This termination point is part of your attack surface. You could think of it as a point of code that gets exposed to external inputs.
    • For example, if you have a HTTP proxy in your corporate firewall that does content filtering, seen from your desktop or the web server, HTTP probably terminates in that proxy. The application layer on top of the HTTP might not terminate there; it could be passed through as is.
    • Another example, if you have a load balancer in front of your cloud service that handles TLS, and passes the HTTP request in plaintext to your cloud instances, the TLS layer terminates in the load balancer, and HTTP is passed through.
    It is critical to understand where each data flow terminates. Problems are usually caused at the termination point.
  • The concept of termination is important because every location where a protocol is being parsed or acted on is a part of the attack surface. You could, for example, think of an illegal input that would cause processing to fail.
  • If data is passed through without acting on it or parsing it, then that passthrough location is not in the attack surface. For example, for most IP routers, anything on top of IP is just in a kind of opaque tube. The router just copies bytes from interface to another interface, so any malicious activity in that data flow does not have an effect.
    • Care should be taken not to confuse ”real” pass-through (where data is really just being passed on) with protocol termination. If a proxy or a load balancer actually tries to look into traffic, parse it, and perhaps filter it, that actually terminates the layer, and thus creates a point in attack surface.
  • This is a very important aspect of resolving ambiguities. If you have black boxes that seem to just be passing through traffic on a certain layer (i.e., do not seem to terminate a protocol), you really have to peek inside each black box to determine whether it really is a pass-through or does it actually actively do something to the data that is flowing through.
  • If it does something actively, treat it as a termination point (and as a new traffic source point).
  • A side note and a reminder from Session 1 & 2: Data as code
    • One layer’s data could be one layer’s code. Typically you would see JSON objects being passed back and forth in a web application. Those can be construed as data objects, but they could be evaluated in JavaScript context, becoming code.
    • Similarly vulnerabilities in a parser may cause parts of data being interpreted as code.
  • Prioritising findings
    • It does matter where the data came from.
    • If you can trust the sender of the data, and can authenticate the sender, you might be able to decide that this potential way to inject malicious data is not a risk. (If you actually authenticate only trusted senders, that is.)
    • Technically, everything that is possible may not be a security risk.
    • Example: Let’s assume you have a Bluetooth-equipped system and you have identified two attack surfaces:
      • An endpoint that receives data objects over Bluetooth from anyone
      • An endpoint that receives audio data from a paired headset
    • then it is most likely that the former one is a bigger risk, and the latter one isn’t, because in that case, only a paired (“trusted”) party can talk to you. You will then want to prioritise the former. Of course, the attack surface that is actually the Bluetooth pairing protocol is another question - if that can be subverted, then the second one becomes higher priority again.
  • Security services for data flows & stores; STRIDE
    • Once we have discovered the data flows and data stores, we will look into security services we need to provide for each.
    • Not every data flow or data store requires all types of security services. Public information might not need confidentiality service, whereas it could benefit from integrity protection (so the recipient knows it is authentic information).
    • The traditional model for security services is ”CIA triad”, Confidentiality, Integrity and Availability. You could use that as a basis too, in which case you would discuss the CIA needs for each data flow and data store.
    • Microsoft, as a part of their Security Development Lifecycle, have come up with two other acronyms, STRIDE and DREAD. The latter has already been disowned by Microsoft themselves so we will also not discuss it.
    • STRIDE, however, seems to work rather well as a discussion facilitator for data flow analysis. One way to do STRIDE analysis is as follows:
      • Take one data flow, or data store, at a time. In the previous phases, you have already determined the specific protocols and data content, so you should be able to have a detailed technical discussion about it.
      • Consider each of the parts of STRIDE by making a statement that everything is fine. For example, give a technical explanation why the authentication works, or why logging is sufficient.
      • Then try to find flaws in your argumentation. If you have a team of people, one can defend this assurance argument and others can throw rotten tomatoes at ir.
  • It is usually helpful to start from the highest protocol layer (e.g., application data) and work your way down if necessary. If you apply a security measure on a higher protocol layer, that may already make discussion of lower layer protocols irrelevant, or change the type of security needs lower layers need to provide.
  • If all the layers of a data flow are not end-to-end (e.g., application data is passed with different underlying protocol stack in different parts of the system), you need to treat each different underlying stack separately. As an example, if application data is passed through a HTTP-over-TLS connection at one point, and a local domain socket at some other point, these are different cases - unless all necessary security services are provided on the application level.
  • STRIDE stands for:
    • Spoofing (Authentication):
      • We know who is calling us / who we are calling, because…
      • We know that the data we got really comes from where it is supposed to come, because…
    • Tampering (Integrity):
      • We know that the data arrives as it was sent, and is complete, because…
      • We know that the data we read from the database / file has not been tampered with, because…
    • Repudiation (Auditability):
      • It is not possible to later rollback an event that happened, because…
      • Our audit logging is sufficient to analyse access to data, because…
    • Information Disclosure (Confidentiality):
      • Information can only be read by those who have a right to do so, because…
      • If this data ends up on pastebin, no information can be deduced from it even in the future, because…
    • Denial of Service (Availability):
      • Even if this component stops responding, the system still works, because…
      • It is not possible to overload this component, because…
    • Elevation of Privilege (Authorisation)
      • Code injection in this data flow is not possible, because…
      • Only these specific user roles will be able to perform this action, because…
      • No other component can use another component as a middleman to perform this action, because…

Discuss: The list of assurance arguments, above, is of course not complete. What other types of sources would you use for getting ideas of aspects to discuss about?

  • Where to find worries?
    • Inherent technical & security knowledge of engineers? What sort of experience and traits would you want to see in an engineer doing STRIDE analysis?
    • Checklists? What are good and bad sides of checklists?
    • Lists of attack scenarios customised for your organisation? Why would they be useful? Is it realistic to build such lists? How would you make them not to be checklists?
    • The Elevation of Privilege card game is an example of facilitating questions is the Elevation of Privilege card game by Microsoft. OWASP Cornucopia is another card game from OWASP.
      • What would be the benefits and downsides of using such a game?
    • The IEEE S&P article (see reading list) documents one way how the questions could be distilled into a list.
      • What do you think about this sort of list that actually cuts out STRIDE altogether?
  • What to do with the results
    • What do we do with the results and findings? - Risk analysis
    • Everything you find may not be a problem.
    • Some problems you find may not be big enough risks.
    • You need to decide whether the cost of mitigation (fixing the issue) is greater than the risk (likelihood of it happening, and its impact). How this process is ultimately driven depends on your organisation and whether you have some sort of mandatory risk management process. However, from experience, here is some practical advice to you as a software engineer.
  • Risk is impact x likelihood. A typical thing that the literature suggests is to calculate is an ”annual loss expectancy”, that is, how many times per year the issue is going to cost you, times how much it is going to cost you each time.
    • The problem is that in many cases, you have no data to base your likelihood estimate on. It’s mostly guesswork. Impact is usually easier to estimate.
    • In many cases, the cost of mitigation in planning stage is very small, so you can actually make architectural changes that are cheap and quick. However, sometimes there could be a large cost involved, and then you may need to ask someone who decides on how money should be spent. In corporate biz speak, you would “escalate”, i.e., ask the boss.
    • If you are an engineer (without a monetary ”approval limit”), make very clear that you do not make any large risk decisions! You are most likely not compensated (=paid) enough to make large risk decisions, and if things go sour, you might end up as the culprit.
    • If someone (your management) tells you that you should not fix an issue you identified, require them to explicitly sign off on the risk. This means, at a minimum, them sending an email that explicitly tells that the risk is acceptable. If you think that the risk is large (typically, a product liability risk), it would be a good idea to print the email and store it personally.
    • If you end up at a position where you need to accept risks, ensure you are compensated for the responsibility you take. If you aren’t, provide first-class technical opinions, but do not allow yourself to become the one who gets fired or jailed.

Some Security Design Patterns

  • We will have a look at three security design patterns. These are definitely not a complete treatise on secure design, but these aspects are perhaps the most common topics that help you to re-architect systems while doing security risk analysis.
  • The patterns are described here. We will show ”bad” architecture examples and then show how they can be made better by applying one or more of these three principles.
  • There are many patterns that are technology specific (dependent on some aspect of a used platform or framework, for example). One researcher has catalogued almost a hundred security design patterns and also classified them according to the STRIDE classification; please see the reading list.

Kerckhoffs’ Principle

  • The principle is today known as the Kerckhoffs’ Principle (note: the person’s last name was Kerckhoffs, so the possessive is thusly spelled; the guy had seven first names plus a nobiliary particle and a second surname, but let’s just cut corners here).
  • The actual principle, as it is currently understood, is from his La Cryptographie militaire from 1883, and a synthesis of two of his six requirements for encryption systems:
    • Requirement 2: If the attacker learns the inner workings of the system, that should not be a problem. In modern cryptographic terms, this means that the algorithm must be secure even if it is public.
    • Requirement 3: (actually the latter part of the requirement only:) The key should be easily changeable. Back in his time, this mainly meant that the key needs to be distinct from the algorithm, that is, the algorithm itself is not the key.
  • Today, ”Kerckhoffs’ Principle” means that the only thing that needs to be kept secret about an encryption algorithm is the key. Everything else must be secure even if made public.
  • It helps the systems’ architectural security if the Kerckhoffs’ Principle is applied to the overall design of the system as well. Usually, in architectural risk analysis (session 4), if someone makes an argument that ”the attacker wouldn’t know how we do things”, this should be an incorrect argument, and it should be countered as a violation of the Kerckhoffs’ Principle. Any security control should be effective even if attacker would know how it worked. Otherwise you are relying on obfuscation - see the discussion about it below.
  • One of the major benefits of following Kerckhoffs’ Principle on all architectural decisions is that it keeps the architectural design ”intellectually honest”. It makes bad security decisions and so-called security debt (a type of quality debt) visible, forcing the “right thing” to be made, and things not being swept under the carpet. This makes it a kind of a meta-pattern that actually triggers other security activities.
  • Often, the most typical step to apply Kerckhoffs’ Principle in a design is to identify the location of encryption or authentication key material, and ensure that this key material really needs to be known for the cryptographic operation to take place successfully. Once you have this logic verified (i.e., you are sure that without a key, an attacker cannot do something), you can apply an extra security boundary that wraps the key material. (For this, see the third security pattern.)

Obfuscation as a security measure

  • An ”anti-pattern” to Kerchoffs’ Principle is obfuscation. It means that something is made very difficult to understand, or reverse-engineer, and some level of security is obtained through the additional trouble of the attacker’s lesser understanding of it.
  • In most cases, obfuscation should not be used as a security control. This doesn’t mean you should freely tell everyone how you have done security engineering, but the point is that the system should stay secure even if you did.
  • You are likely to meet obfuscation in cases where you have client-side code in a high-level language where, for some reason, someone has decided that it shouldn’t be readable - for example, JavaScript on a web UI. In most cases, the question should be that exactly what is the security target and threat model. In the worst case, you will find actual access credentials hidden in the obfuscated code.
  • Another place where obfuscation is often seen is in cryptographic algorithms where a DRM (Digital Requirements (or Restrictions) Management) key is being used. There are obfuscated decryption algorithms where the decryption key does not need to reside in device memory all at one time during decryption, and obfuscation is used to frustrate memory dump attacks.
  • In most cases that I’ve seen, discussion about obfuscation just means that the security risk analysis is trying to address an issue that would require a large architectural change, but this change is not possible due to business reasons.
  • Reasons why obfuscation is not suitable as the only security practice:
    • Obfuscated code is hard to maintain. Code that has been obfuscated is by definition not readable, and security engineering practices that are based on code analysis get more difficult. Some obfuscation tools actually run in the build chain, so the source is readable, and this point can be moot – but manual obfuscation would cause a lot of problems with maintenance.
    • Good obfuscation is surprisingly hard to do. If you do it manually, then you are potentially depending on certain persons who are able to do it (creating a personnel dependency). In many cases, naïve obfuscation can be completely bypassed because the code can be attacked through an alternate channel; e.g., wait until the system is in a state where, say, an encryption key is in memory, and then dump the process memory. Whatever obfuscation happened before this becomes irrelevant.
    • You are trusting your people with the ”keys” (information of how to compromise your system) but you have no real way of revoking this information if these people turn out to be untrustworthy. With proper engineering, you could just revoke those persons’ access.
  • There are, however, three valid reasons why obfuscation can be a part of a security control. It just needs to be understood what benefits it offers and what restrictions it has.
    • Obfuscation can act as insurance, if the obfuscation tool / service provider gives you guarantees that it won’t be reversed - essentially, they agree to pay you if it gets reversed. Some commercial obfuscation tool providers may do this. They should be prepared to cover the actual loss expectancy due to a breach. (If they don’t, avoid them - this means they won’t trust their own product for your specific business case.)
    • Obfuscation can be an additional filter for attackers, meaning that you only have to deal with the best ones (those that can either reverse, or bypass your obfuscation), or an attack might be only working against a part of your customer base (if you use different obfuscation for different customers). This won’t make your system more secure in absolute terms, though. It just could mean that the really good attackers won’t even bother, having more profitable targets elsewhere.
    • Obfuscation can act as time based security for systems that are to be fielded for a limited time and become irrelevant after a short time. For an attacker to figure out the obfuscation layer may be long enough that the system will be obsolete. However, beware of the tendency of information systems staying around for much longer than originally planned!

Bootstrapping trust

  • One of the reoccurring questions in security engineering is the chicken-and-egg problem: In order to trust something, we’ll often have to trust something else. At some point, we have to make a leap of faith, or ”take something for granted”.
  • The thing we ”take for granted” is known as the trust root. You might already be familiar with this from the world of certificates (such as X.509 certificates used in TLS and S/MIME) where there is a trusted party, a Certification Authority, that has a root certificate. The certificates presented by various web sites are tied back to this root, but the root itself is taken for granted - in many cases, it comes bundled with your operating system or browser.
  • The key to understand bootstrapping trust is to understand the chain of trust that is being analysed, and where the ”buck stops”. Examples:
  • Certificates: A root certificate signs an intermediate certificate; an intermediate certificate signs the server certificate.
  • Secure boot: The BIOS and Trusted Platform Module (TPM) ensure that the operating system that is being booted has not been tampered with; the operating system verifies that the applications have not been tampered with; and the application then verifies that its data has not been tampered with. This way, a machine could (theoretically) be booted to a known state even after compromise, and you only have to replace the layer that was breached.
  • Each member of the trust chain performs an action called a measurement. This is usually a cryptographic one-way hash function calculated over the thing whose integrity is being attested. Before the control is given to the next link in the trust chain, the previous link needs to check that the measurement has not changed.
  • When you require this sort of trust chaining, there are some major things that need to be kept in mind:
  • The original point of trust that is ”taken for granted” may need to be highly secure. Depending on your threat model, you might need a hardware system that has anti-tampering systems in place. In many security systems, this means it would be a smart card or an embedded security system. (For example, ARM has a brand called TrustZone.) Especially in systems that reside physically with an attacker - like payment terminals and mobile phones - you may need to resort to tamper-resisting hardware solutions. This may also require “traditional” physical security measures such as tamper-evidence (if there is someone to see the evidence).
  • Sometimes it is ok for the source of trust to be readable but not writable - like root certificates often are. You just need to know that the roots haven’t changed. However, it does happen that sometimes these roots need to be revoked - it has happened several times that a Certificate Authority has been found to be breached, and a root certificate needs to be removed. This sort of rekeying of trust roots may prove to be very costly especially in embedded systems or physical products, if you have not planned the update channel in advance. But remember that the update channel needs to be more trustworthy than the trust root that it is updating! Otherwise you have just created a back door.
  • Typically, when you move towards a more complex system - as in a secure boot, where you first start with TPM and the BIOS, and end up running a full-fledged server with virtualised OSes - the complexity of code being run in the end means that the system may not be “secure” or “integrity protected”. The code may have vulnerabilities. However, the point of a trust chain is that if you pull the plug (and re-measure everything), the system can be brought back into a known (“trusted”) state. If you have patched the vulnerability and cleaned up, you have a theoretical way of recovering from an attack and to avoid attacker becoming persistent on the system.
  • Today, in most systems, you are not doing architectural risk analysis down to the first trust root. You will stop at some layer - typically, in many cases, the OS layer - and assume that the OS is secure and its security functionality actually works. This is for practical purposes often something you have to do. However, let this pattern remind you that you need to explicitly accept this risk. If you are building a system that cannot take that risk, you must still go deeper to find the chicken that laid that specific egg.

Abstracting away problems through adding security boundaries

  • You will find some ”naturally occurring” security boundaries in your design. At a minimum, you would usually see that a server is run within an OS (although other programs could also run in there) and that your application probably is a process within that OS (so, if you aren’t running on a ‘90s home computer, other processes probably cannot just freely mess around with your memory).
  • However, there is a very powerful architectural analysis technique where you will encapsulate security problems by creating new security boundaries.
  • This can be used as an optimisation: If you cannot make something secure enough (there’s always that little nagging doubt), you can just wrap that into a security domain, technically enforce that domain, and then you don’t have to worry (that much) about what happens in that security domain. What happens in Vegas, stays in Vegas.
  • Of course, what you have to remember is that the outputs from that security domain you introduced may not be trustworthy, if the attacker compromised that domain.
  • The canonical example of this thinking and pattern is from qmail, which is a Mail Transport Agent (that is, a mail server) by Daniel J. Bernstein. qmail web page claims it has not had a security vulnerability since 1997 even though it has an ongoing bug bounty program.
  • qmail consists of several processes that are mutually untrusted. If someone would somehow find a vulnerability in a process that processes incoming mail, that would be the extent of the attacker. The rest of the qmail would not be compromised.
  • Typically, if you have to parse a data format (such as an incoming protocol), you could make use of the ”qmail thinking” by extracting the parsing part - which, as we saw in Session 1, would be prone to vulnerabilities - into a separate process that would be run with little permissions. This is what we described in Session 2.
  • Typically, you would use various OS and platform provided methods to add these security boundaries, or to compartmentalise:
    • On UNIX-type systems, you can have different processes running under different users, and segregate file access with users and groups;
    • again, on POSIX (and, for example, Linux), you have ”capabilities” which are further permissions that can be granted to some processes but not others;
    • you can use a Mandatory Access Control system such as AppArmor to enforce which sort of actions executables can do;

    As a side note, the Docker project is a very interesting development that leverages the Linux kernel security features (including AppArmor or SELinux) and offers the above as a fairly easy-to-use package. See the libcontainer project.

    • you can run things inside a Virtual Machine - either a VM that runs an OS, or a bytecode interpreter that has some protection, like a Java VM. The hypervisors can offer further access controls for what the programs running inside them can access;
    • or isolate parts of a computation on a restricted system that can only talk to you through a limited connection. Some systems do key operations on devices that only link over a serial connection. A smart card (e.g., your SIM card) is such a thing;
    • or you can do ”air-gapping”, which is a low-tech but very effective way of ensuring that something is integrity protected. Save it on a filesystem on a disk, unmount the file system and put it into a safe.
  • You will have to remember the data flow diagram from Session 4, though. Any security domains that are within other security domains will be compromised if the outer domain is compromised, at least to the level of a Denial of Service attack, if not worse. An example: If your virtual machine hypervisor gets compromised, everything within that hypervisor is compromised - even if you would have split the processing into several security domains. If your host is compromised, all Docker containers within it will also be compromised. And so on.

Aggressive attack surface reduction through Unikernels

  • In the above discussion, we referred to the Docker project. Currently Docker could be seen as an easy way to package Linux kernel features to create additional security boundaries.
  • The other major consideration in Docker is that all the dependencies that an application requires will be bundled into the container, and can be deployed and run together.
  • This aspect can be taken further and converted into a security aspect by requiring that only the necessary dependencies are bundled. As we remember from earlier lectures, dropping code out of the system also usually decreases the attack surface, because it removes bugs that were in that code.
  • Taken into the extreme, one could create a deployable application that brings all dependencies it needs but nothing more with it. In essence, also only those parts of the underlying OS that are required are included. As an example, if the application does not need UDP/IP, only the TCP part of the IP stack would be included.
  • The application plus all the dependencies could be compiled into one unit, which is directly executable on hardware or on a hypervisor, without an underlying OS. This unit is known as the unikernel.
  • Unikernels could be implemented, e.g., using set of libraries that implement specific (userspace and kernel) functionality, or microkernels, where only the required functionality is included in the kernel.

Reading list

This is a list of useful documents that will enhance your understanding of the course material. The reading list is session-by-session. “Primary” material means something you would be expected to read if you are serious about the course, and may help you to do the weekly exercise; “Additional” material you may want to read if you would like to deepen your understanding on a specific area.

Primary material

Additional material

  • Ross Anderson: Security Engineering, 2nd Edition, 2008. This is a great book, and fun to read, although you could probably build a house using them as bricks. If you want to get serious about security engineering - especially systems which have unique properties, and not just run-of-the-mill web apps - then I would really recommend reading this book in its entirety at some convenient time. For the purposes of this session, the following chapters have interesting background:
    • For the design patterns: Chapter 8, Multilevel Security, and Chapter 9, Multilateral Security.
    • For the security-through-obscurity designs: Chapter 22, Copyright and DRM.
  • On the subject of using crypto properly, I can recommend Ferguson et al.: Cryptography Engineering. Part I, Chapter 1: Introduction and Chapter 2: Introduction to Cryptography discuss using cryptography in the context of the system’s architectural risk model. The book is very good otherwise as well; if you are planning a career in this field, I recommend you get your hands on it (as well as Anderson’s Security Engineering).
  • A paper that looks at design principles behind engineering systems that need to provide evidence about how the systems were used or how they behaved. This has a lot to do with openness and a clear definition of a Trusted Computing Base. Steven J. Murdoch & Ross Anderson: Security Protocols and Evidence: Where Many Payment Systems Fail, 2014.
  • Kerckhoffs enters the stage in David Kahn’s excellent history of symmetric cryptography, Codebreakers, chapter 8: ”The Professor, The Soldier, And The Man On Devil’s Island”. Codebreakers is about a thousand pages, but a fun read. Unfortunately it stops short of making any serious attempt at any post-World War II crypto, including the public key algorithms.
  • Danny Dhillon: Developer-Driven Threat Modeling. IEEE Security & Privacy, Jul-Aug 2011. Available also at http://www.infoq.com/articles/developer-driven-threat-modeling. This article tells the story of how EMC started to do security risk analysis, use data flow diagrams, and how they ended up scaling it. You can feel the challenge of not having security folks in every team; EMC describes using a “threat library”, which approaches a checklist.
  • Give a quick glimpse at Munawar Hafiz’s (Assistant Professor at Auburn University) Security Pattern Catalog. He lists almost a hundred security patterns with explanations, some fairly technology specific and others high-level.
  • The rest of Adam Shostack: Threat Modeling: Designing for Security. If you are planning to start threat modelling at a company, this is good reading, although much more than this course would cover.

Endnotes

This is lecture support material for the course on Software Security held at Aalto University, Spring 2018. It is not intended as standalone study material.

Created by Antti Vähä-Sipilä <avs@iki.fi>, @anttivs. Big thanks to Sini Ruohomaa and Prof. N. Asokan.

  • No labels