Skip to end of metadata
Go to start of metadata


Available research topics for students

This page lists the research topics that are currently available in the Secure Systems group. Each topic can be structured either as a special assignment or as an MSc thesis depending on the interests, background and experience of the students. If you are interested in a particular topic, send e-mail to the contact person (as well as the professor responsible: either N. Asokan or Tuomas Aura) listed explaining your background and interests.

All topics have one or more of the following keywords: 

PLATSEC Platform Security

BCON Blockchains and Consensus

ML & SEC Machine learning and Security/Privacy

USABLE Usable security and stylometrics

OTHER Other systems security research themes


Detecting image defects and using Ersatz data on autonomous vehicle benchmarks, in the presence of adversaries ML & SEC

Computer vision has become an important requirement in autonomous vehicle / drone scenarios. Such systems may rely on several sensors to function faultlessly, and by default, defects in these may require the system to stop operating immediately in order to avoid damage to the system and environment. Dealing with sensory defects autonomously may be difficult in certain scenarios: a sensory fault on a high-way may require the driver to take control of the vehicle to avoid accidents.

Recently, Generative Adversarial Networks have been proposed in areas of image inpainting with remarkable performance [1]. One of the proposed application areas for this technology has been autonomous vehicles. The system may detect sensory failure and deal with it with usage of Ersatz data in a short term, so that the short-term operational objective (e.g. detection of road signs) is not compromised.

However, directly applying inpainting on images in autonomous vehicles may risk the safety of the passenger and environment. To quantify the risk of using Ersatz data, the student will evaluate several state-of-the-art implementations of computer vision models on the KITTI vision benchmark suite [2]. The input images of interest have been partially corrupted, but repaired with inpainting techniques.

Part of the master's thesis involves threat modeling. How can a malicious entity influence the system to misclassify situations. For example, if the adversary can only influence a few regions of the image, how should the image be influenced? What happens if users contribute to building the Ersatz model? Can back-gradient optimization [5] be used by a malicious data contributor to poison the Ersatz model to do an integrity violation, e.g. teach the inpainting algorithm to fail at reconstructing images at specific scenarios? More information on image completion at [3, 4].

Requirements: Computer vision and deep learning fundamentals. Information security. Pytorch knowledge a bonus.

[1] Iizuka, Satoshi, Edgar Simo-Serra, and Hiroshi Ishikawa. "Globally and locally consistent image completion." ACM Transactions on Graphics (TOG) 36.4 (2017): 107.

[2] Geiger, Andreas, Philip Lenz, and Raquel Urtasun. "Are we ready for autonomous driving? the KITTI vision benchmark suite." Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on. IEEE, 2012.

[3] Huang, Chenduo, and Koki Yoshida. "Evaluations of Image Completion Algorithms: Exemplar-Based Inpainting vs. Deep Convolutional GAN." http://cs231n.stanford.edu/reports/2017/pdfs/306.pdf

[4] Brandon Amos. "Image Completion with Deep Learning in TensorFlow". http://bamos.github.io/2016/08/09/deep-completion/

[5] Muñoz-González, Luis, et al. "Towards poisoning of deep learning algorithms with back-gradient optimization." arXiv preprint arXiv:1708.08689 (2017).


For further information: Please contact Mika Juuti (E-mail: mika.juuti@aalto.fi) and prof. N. Asokan.


Attack resilient features selection method for ML security application ML & SEC

Machine learning (ML) is increasingly used for security applications recently. In contrast to many other applications, such as image or speech recognition, security is a field with specific characteristics including that the presence of an attacker is a given. Attackers will try to fool a learned model and craft samples that are meant to be misclassified. Thus, machine learning methods used for security applications must take this factor into account.

Typically, before learning an ML model a feature extraction and selection process is applied to obtain relevant features. Methods to select relevant features are usually based on computation of information gain and aim to discard features with low information and redundant features bringing similar information. This process considers only the impact a feature has on the classification process. However, in an adversarial context, some features may be easier to manipulate than others for an attacker. Discarding features with low information but hard to manipulate may make the model more vulnerable to attacks.

In this context, the goal of this project would be to define a feature selection process taking the manipulability of features into account as a criteria to order the most relevant features for a classification task. Another parameter that can weight in the selection process is the computation time for each feature since speed is usually critical in security applications. Later this selection method can be used to define a two step classification process in which features hard to manipulate and cheap to compute can be used to give a first approximate decision and trigger a second step more costly and more accurate to render a final decision about an instance to classify. Such a technique would be more resilient to attackers than classical methods.

 Requirements:

- knowledge about supervised machine learning algorithms, statistical analysis, features selection methods.

- sufficient programming skills in e.g. Python, Matlab or C to implement ML algorithms

 Resources:

[1] Miller, Brad, et al. "Adversarial active learning." Proceedings of the 2014 Workshop on Artificial Intelligent and Security Workshop. ACM, 2014.

[2] Huang, Ling, et al. "Adversarial machine learning." Proceedings of the 4th ACM workshop on Security and artificial intelligence. ACM, 2011.

[3] Guyon, Isabelle, and André Elisseeff. "An introduction to variable and feature selection." Journal of machine learning research 3.Mar (2003): 1157-1182.

[4] Blum, Avrim L., and Pat Langley. "Selection of relevant features and examples in machine learning." Artificial intelligence 97.1 (1997): 245-271.

For further information: Please contact Samuel Marchal (samuel.marchal@aalto.fi) or Prof. N. Asokan.


__________________________________________________________________________________________________________________________________________________________________________________

Distributed Editing in Delay-Tolerant Networks PLATSEC

Delay-Tolerant Networks (DTNs) [1] are the networks where the main principle is to store-carry-forward messages utilizing the nodes mobility to create an overlay network in ad-hoc manner. This way the connectivity does not always have to be end-to-end opposed to traditional networks. Messages (called bundles) are self-contained data units. Applications for DTNs should be developed taking into account these principles. 

This assignment focuses on implementing a "Google Docs" for DTNs and security considerations are of the utmost importance, since the documents are possibly getting processed on the nodes that the document owner does not trust. Attribute-Based Encryption [2] and Proxy Signatures [3} are two schemes to enable the delegation of editing capabilities to the nodes that the owner trusts and limit the viewing and editing capabilities of other nodes. 

Requirements:

  • Good skills in C/C++.
  • Experience with distributed systems.
  • Understanding of basic crypto primitives.

References:

  1. RFC4838. Delay-Tolerant Networking Architecture.
  2. Attribute-based Encryption for Fine-grained Access Control of Encrypted Data. V. Goyal, P. Omkant, A. Sahai, B. Waters.
  3. Secure Proxy Signature Schemes for Delegation of Signing Rights. A. Boldyreva, A. Palacio, B. Warinschi.

For further information: Please contact Arseny Kurnikov (Email: arseny.kurnikov@aalto.fi) or Prof. N. Asokan.

________________________________________________________________________________________________________________________________________________________________________________________________

Distributed Networks of SGX Enclaves PLATSEC

Software Guard eXtensions (SGX) is an implementation of Trusted Execution Environment (TEE) from Intel. It introduces a notion of an enclave - a part of the application that is cryptographically isolated from other system components, including priviledged ones like hypervisor or system kernel. We developed an enclave that processes login attempts and thus it requires rate limiting.

When deploying the app there is a requirement to have several such enclaves for elasticity and fault-tolerance. The maximum allowed rate should be shared between enclaves so that an adversary does not get higher guessing rates when probing several enclaves at once. The challenge is to design a scheme that would allow to split and combine the maximum rate between many enclaves on the fly. The rate limiting application is just an example and there might be other scenarios that would need an implementation of a distributed enclaves network that share a state and require consistency. Depending on how broad the scope gets, this topic can be either a special assignment or a thesis topic.

Requirements:

  • C programming skills.
  • Basic understanding of security mechanisms.

References:

  1. Innovative Technology for CPU Based Attestation and Sealing. I. Anati, S. Gueron, S.P. Johnson, V.R. Scarlata
  2. Mitigating Password Database Breaches with Intel SGX. H. Brekalo, R. Strackx, F. Piessens.
  3. Intel(R) Software Guard Extensions SDK Developer Reference.

For further information: Please contact Arseny Kurnikov (Email: arseny.kurnikov@aalto.fi) or Prof. N. Asokan.

_______________________________________________________________________________________________________________________________________________________________________________________________________

Proofs of Good Behaviour PLATSEC


Summary: Using trusted hardware to reduce spam and denial of service attacks. 


As of March 2017, spam messages accounted for over 56 percent of e-mail traffic worldwide [1]. In 2016, the so-called Mirai botnet, consisting of a large number of compromised IoT devices, was used to perform several high-profile Distributed Denial of Service (DDoS) attacks [2]. These examples illustrate the costs and consequences of the current Internet and Web architectures, in which all clients are treated equally. In the past, it was generally not possible for an email or web server to know much about an incoming email or web request. Virtually all information, including the source address, could be spoofed. However, it has already been shown that trusted hardware and remote attestation can be used to change this situation. For example, Not-a-Bot [3] used a small trusted execution environment to monitor certain hardware (e.g. mouse and keyboard) and provide a proof of human presence. However, human presence is only one of the factors that could be used to decide whether a request is legitimate. The aim of this project is to develop a suite of "proofs of good behaviour" (PoGBs) that can be used by clients in different circumstances. For example, when sending an email, a simple PoGB could be an authentic statement of the number of emails sent by that client in the past hour. When requesting a web page, an appropriate PoGB might be a proof of human presence, as in Not-a-Bot. The authenticity of a PoGB can be assured using modern trusted hardware (e.g. Intel SGX and ARM TrustZone) and remote attestation. 


Required skills:
- Good knowledge of basic information security concepts
- Critical thinking

Nice to have:
- Experience with C/C++ programming, and/or web technologies

References:

For further information: Please contact Andrew Paverd (Email: andrew.paverd@aalto.fi) and Prof. N. Asokan. Your application will receive special consideration if you suggest any new PoGB not mentioned above.

____________________________________________________________________________________________________________________________________________________________________________________________________

Reserved Research Topics


Meet in the Middle (MitM) Attacks: Analysis of the collision properties and influence of the Simon Algorithm (reserved)


Symmetric key cryptographic algorithms such as block ciphers are part of security protocols and are widely used to secure all types of communication. Among the most famous algorithms the AES cipher is considered as one of the most secure. In the recent years researchers have found weaknesses on reduced versions of this primitive. In the single key context the most powerful attacks are based on the famous Meet in the Middle attack [1,4]. The complexity of this attack relies on the problem of finding collisions between some intermediate values [2,3,4]. While a classical solution to this problem is known as the birthday paradox, the results might be slightly different than expected due to the particular structure of messages derived for the cipher structure. A large part of the project will then consist at implementing a toy cipher and at analyzing the results of our simulations. 

Depending how the project evolves it would be interesting to study how quantum computing will influence the speed of this attack. The possibility of quantum computing is a treat for cryptographic algorithms. While the impact on the asymmetric primitives is huge, for symmetric primitives such has block ciphers it has been a common believe that doubling the key size will provide the same security level. Recently it has been shown that it is more complicated than that (see on differential and linear cryptanalysis [5]) and the impact of quantum computing with respect to all known attacks should be analysis. In a second part of the project, I recommend to study the impact of the SIMON quantum algorithm on MitM attacks.

The main tasks of this project would be to:

  • Understanding the MiTM attacks and implementing it on a toy cipher.
  • Comparing the theoretical complexity of the attack (time complexity) with the one obtain from our implementation.
  • In case of difference, trying to derive an appropriate statistical model, which would explain this difference.

Extra: Understanding the Simon algorithm and analyze the influence of this algorithm on the speed of a MitM attack.

Requirements:

  • Basic knowledge of cryptography, of ciphers such as DES and AES.
  • Basic programming skills.
  • Some understanding of the birthday paradox problem.

Resources:

[1] https://en.wikipedia.org/wiki/Meet-in-the-middle_attack

[2] Mohamed TolbaAmr M. Youssef:
Generalized MitM attacks on full TWINE. Inf. Process. Lett. 116(2): 128-135 (2016)

[3] Anne CanteautMaría Naya-PlasenciaBastien Vayssière:
Sieve-in-the-Middle: Improved MITM Attacks. CRYPTO (1) 2013: 222-240

[4]Patrick Derbez, Pierre-Alain FouqueJérémy Jean:
Improved Key Recovery Attacks on Reduced-Round AES in the Single-Key Setting. EUROCRYPT 2013: 371-387

[5]Marc KaplanGaëtan LeurentAnthony Leverrier, María Naya-Plasencia:
Quantum Differential and Linear Cryptanalysis. IACR Trans. Symmetric Cryptol. 2016(1): 71-94(2016)

For further information: Please contact Céline Blondeau Mail: celine.blondeau@aalto.fi and Prof. N. Asokan.

_____________________________________________________________________________________________________________________________________________________________________________________

MLaaS model stealing attacks and defenses (reserved) ML & SEC

ML & SEC

Machine Learning as a Service (MLaaS) is a new service paradigm, which outsources machine learning models to cloud service providers. MLaaS clients request the cloud hosted ML models for predictions that are typically charged a small amount of money.

Cloud-deployed models are however subject to so-called model stealing attacks (or model extraction) [1, 2], where a malicious user tries to obtain a perfect replicate (in case it is a linear model) or a near-replicate of the MLaaS model. This attack typically relies on crafting synthetic samples and getting predictions from the original model. The replicate model is then learned using this obtained labeled data. The attacker can subsequently use this replicate model to get free prediction. 

Andrew Ng famously said: "It is not the one who has the best algorithm that wins. It's the one who has the most data." Attackers are at a constant disadvantage with respect to large organizations who have large sample datasets: if he is lucky, he may have access to some few examples of each class. Attackers resort to model stealing as a means to offset this disadvantage. In many cases attackers have some freedom in choosing up to a few thousand queries to send to the server, which will award them with extra samples to complement his initial model. The goal of the attacker is to carefully choose or generate these samples he will get a prediction for such that he will make a minimum of queries to the server and learn a replicate model that reproduce as faithfully as possible the decision of the original model. 

Below are the main tasks that will be addressed in the thesis work, and some qualifiers for an excellent thesis. 


* Main tasks in thesis:

** Reference implementations of 2 state-of-the-art model stealing attacks + evaluation on reference datasets (MNIST + CIFAR + german traffic sign) of the quality of replicate model compared to original one (similar prediction on test set + random samples)

*** Tramer [1] attack for full recovery of attacked model

*** Papernot [2] attack for locally true recovery of attacked model


** Experimental analysis of synthetic sample generation techniques

*** Targeted and non-targeted attacks

*** Experimentation with novel generation methods: follow gradient to several targeted class (not only one), GANs, use of training samples with assumed labels (that do not require prediction by the original model): training sample augmentation with intermediate points on query trajectories


** Detection of model stealing attacks as they occur

*** Leverage the fact that model stealing requires numerous requests

*** Comparison with state-of-the-art techniques that target the detection of single adversarial samples


** Implementation of defence strategy:

*** adding noise / randomness in prediction for detected attackers

*** evaluate the impact od noisy prediction on model stealing attack 


* Extra qualifiers for an excellent thesis:

** extend work to a new direction

*** how to deal with having no initial data?

*** proposed novel techniques for detecting adversarial examples and model stealing attacks as they occur

*** propose novel training techniques to deal with non-prototypical synthetic data points in training


[1] F. Tramer, F. Zhang, A. Juels, M. K. Reiter, and T. Ristenpart. Stealing machine learning models via prediction apis. In USENIX Security Symposium, pages 601-618, 2016.

[2] N. Papernot, P. McDaniel, I. Goodfellow, S. Jha, Z. B. Celik, and A. Swami. Practical black-box attacks against machine learning. In Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security, ASIA CCS '17, pages 506-519. ACM, 2017.


For further information: Please contact Samuel Marchal (samuel.marchal@aalto.fi) or Prof. N. Asokan.



_______________________________________________________________________________________________________________________________________________________________________________________________________



Finding new software vulnerabilities by web scraping (reserved) USABLE


Advances in natural language parsing have enabled fast approximate sentence parsing with neural networks [1].
Recently, a structured model for text extraction using a structure called AETR (Agent-Event-Theme-Recipient) has been introduced,
which structures sentence clauses into 4-word bags-of-word-representations. Such a representation incorporates more semantic meaning
of sentences than traditional n-gram bag-of-words representations do.

The purpose of this work is to write a web scraping tool that searches relevant online forums and web sites for potential new
vulnerabilities in software, given a certain source code. If the student has machine learning or data mining background,
he/she may additionally participate in the design of the system, as well as future steps, which may extend the work into a master's thesis.

Required skills: Good programming skills (e.g. python).
Nice to have skills: Web programming (javascript/html), and database management skills. Machine learning and neural network skills.


Resources:

[1] Honnibal, M. "spaCy (Version 0.100. 6)[Computer software]." (2016).

For further information: Please contact Mika Juuti (E-mail: mika.juuti@aalto.fi) or Tommi Gröndahl (E-mail: tommi.grondahl@aalto.fi) and prof. N. Asokan.


_______________________________________________________________________________________________________________________________________________________________________________________________________


Fast Cryptocurrency Payments using Trusted Hardware (reserved)



Summary: Improving the speed and efficiency of blockchain-based cryptocurrency payments.


Bitcoin payments take up to one hour to be confirmed, and this time might even become longer as the transaction volume increases. This long delay is due to the possibility of double-spending i.e. attempting to send the same funds to two or more different recipients. We have developed an approach for providing very fast confirmations of payments, using a Trusted Execution Environment (TEE) [1]. So far, we haveimplemented this using Intel's new Software Guard Extensions, which is available in modern PC platforms. We have integrated this into the popular CoPay cryptocurrency wallet.


The aim of this project is to implement a similar technique using ARM TrustZone, another type of TEE available on many smartphone platforms, and to integrate this into CoPay, and/or other cryptocurrency wallets.



Required skills:
- Excellent knowledge of basic information security concepts
- C programming skills

Nice to have:
- Experience with Javascript

References:
[1] https://aaltodoc.aalto.fi/handle/123456789/27919



For further information:
 Please contact Andrew Paverd (Email: andrew.paverd@aalto.fi) and Prof. N. Asokan. Your application will receive special consideration if you can briefly explain in your own words how the approach in [1] prevents double-spending.


_______________________________________________________________________________________________________________________________________________________________________________________________________


Reinforcement learning for MLaaS model stealing attacks (reserved) ML & SEC



Machine Learning as a Service (MLaaS) is a new service paradigm, which outsources machine learning models to cloud service providers.
Subsequent predictions on the server may require the client to pay for the service. The cloud-deploted models are however subject to
so-called model stealing attacks [1, 2], where a malicious user tries to obtain a perfect replicate (in case it is a linear model) or
a near-replicate of the MLaaS model.


Contextual bandits [3] are reinforcement learning systems that have gained research interest lately, for example in recommender systems.
In these systems, the AI has a number of levers that it can pull at any given time. Each of the levers yields a different reward,
which in the contextual bandit problem, depends on the context of the world state. Such model formulations are suitable for model
stealing attacks, where each of the levers can be thought of as a separate recommender system that suggests query points to explore.
The student needs to explore and independently explore different formulations of how to tune the contextual bandit algorithm.
Extending the topic to a master's thesis requires that the student performs independent thinking and solutions to the problem.

Required skills: Basic machine learning knowledge and good programming knowledge.
Nice to have skills: Understanding of neural networks and optimization.

Resources: [1] Tramèr, Florian, et al. "Stealing Machine Learning Models via Prediction APIs." USENIX Security Symposium. 2016.
[2] Papernot, Nicolas, et al. "Practical black-box attacks against machine learning." Proceedings of the 2017 ACM on Asia Conference
on Computer and Communications Security. ACM, 2017.
[3] Dwight Gunning. "An Introduction to Contextual Bandits". link: https://getstream.io/blog/introduction-contextual-bandits/ (accessed 29.8.2017)

For further information: Please contact Mika Juuti (E-mail: mika.juuti@aalto.fi) and prof. N. Asokan.

_________________________________________________________________________________________________________________________________________________________________________________________

Automatic Ownership Change Detection of IoT devices (reserved)

Considering the increasing deployment of IoT devices, their ownership is likely to change during their life cycle. Personal IoT devices used in smart home environment contain sensitive user data. Ownership change of such devices can introduce threats against users privacy. To address this problem, we need a system for securely handling ownership change of IoT devices. This system must provide technique that allows an IoT device to detect by itself that his ownership has changes.

The goal of this project is to provide a solution to this problem. A possible solution can rely on context sensing and more precisely the analysis of surrounding connected objects. The set of surrounding objects can be continuously analysed to infer movement of an IoT device to an extent that would mean that its ownership has changed. Time series analysis (e.g. change point detection) where each point represents the set of surrounding objects is a possible solution to this problem. The proposed solution must be implemented and assessed using a testbed of several smart home IoT devices we already have. 

Requirements: C++ programming, network traffic monitoring (machine learning and data mining techniques is a plus) 

References: Enhancing Privacy in IoT Devices through Automated Handling of Ownership Change

For further information: Please contact Samuel Marchal (Email: Samuel.marchal@aalto.fi) or Prof. N. Asokan.

______________________________________________________________________________________________________________________________________________________________________________________

Benchmarking Byzantine fault-tolerance protocols BCON


The surging interest in blockchain technology has revitalized the search for effective Byzantine consensus schemes. In particular, the blockchain community has been looking for ways to effectively integrate traditional Byzantine fault-tolerant (BFT) protocols into a blockchain consensus layer allowing various financial institutions to securely agree on the order of transactions. 

The goal of this topic is to implement and deploy several state-of-the-art BFT protocols ([1], [2], [3]) on the Triton cluster, and provide a systematic benchmark for them. Then, we will adapt them to some concrete scenarios or fit them into the HyperLedger framework [4]. By completing this, you will get expertise in both theoretical and practical part of private blockchains. 

Requirements:

- Basic knowledge of distributed systems 

- Familiar with C/C++ and network programming. 

References: 

[1] https://www.cs.utexas.edu/~lorenzo/papers/kotla07Zyzzyva.pdf

[2] https://arxiv.org/abs/1612.04997

[3] https://arxiv.org/abs/1503.08768

[4] https://www.hyperledger.org



For further information: Please contact Jian Liu (Email: jian.liu@aalto.fi) or Prof. N. Asokan.

______________________________________________________________________________________________________________________________________________________________________________________

Secure Proofs-of-Publication with Trusted Hardware (Reserved) BCON

Many systems use transparency as a security feature, but it is difficult to prove publication without placing trust in the publisher. Certificate Transparency, for example, has logs provide publication-promises to insert into certificates, but we have only the word of the log that they will actually do so.

A monotonic counter can be used to bind counter operations to data, resulting in a visible gap if a message is not published. We will use this to build a public bulletin-board that provides a proof-of-publication to its posters, where the issuance of such a certificate convincingly demonstrates that the publisher must either publish within a certain time, or be detectably in breach of protocol by the entire world.

Requirements:

- Basic knowledge of cryptography.

- Familiarity with C/C++ programming. 


For further information: Please contact Lachlan Gunn (Email: lachlan.gunn@aalto.fi) or Prof. N. Asokan.

________________________________________________________________________________________________________________________________________________________________________________________

Text style obfuscation to prevent author deanonymization by linguistic style ML & SEC


The author of a text can be detected via stylometry, i.e. the classification of texts based on linguistic style. This leads to the posibility of a deanonymization attack, where the author is revealed against their will. To mitigate such an attack, the style of the text needs to be altered in a way which prevents author identification but retains the meaning to a sufficient extent. So far most studies of automatic style obfuscation have used back-and-forth machine translation to change the original text. This method is crude and can negatively affect semantic and grammatical understandability. This project involves developing additional methods for style obfuscation, as well as better evaluation criteria to assess their success not only in hiding the author but maintaining the original meaning. The project will be conducted on English data. An understanding of natural language processing (NLP) and machine learning methodologies is required, as well as programming skills in some language(s) usable for these (e.g. Python or Java).

Requirements: programming skills in some language usable for NLP and machine learning (e.g. Python, Java), basic understanding of machine learning.

Nice to have: basic understanding of NLP.

Resources:

[1] Mishari Almishari, Ekin Oguz, and Gene Tsudik. Fighting Authorship Linkability with Crowdsourcing. In A. Sala, A. Goel, and K. Gummadi, editors, Proceedings of the second  ACM conference on Online social networks, pages 69–82, 2014.
[2] Michael Brennan, Sadia Afroz, and Rachel Greenstadt. Adversarial stylometry: Circumventing authorship recognition to preserve privacy and anonymity. ACM Transactions  on Information and System Security, 15(3), 2011.
[3] Siobahn Day, James Brown, Zachery Thomas, India Gregory, Lowell Bass, and Gerry Dozier. Adversarial Authorship, AuthorWebs, and Entropy-Based Evolutionary Clustering.  In 25th International Conference on Computer Communication and Networks (ICCCN), 2016.
[4] Nathan Mack, Jasmine Bowers, Henry Williams, Gerry Dozier, and Joseph Shelton. The Best Way to a Strong Defense is a Strong Offense: Mitigating Deanonymization Attacks via Iterative Language Translation. International Journal of Machine Learning and Computing, 5(5):409–413, 2015.

[5] Martin Potthast, Matthias Hagen, and Benno Stein. Author obfuscation: Attacking the state of the art in authorship verification. In CLEF 2016 Working Notes, 2016. 

For further information: Please contact Tommi Gröndahl (E-mail: tommi.grondahl@aalto.fi) or Mika Juuti (E-mail: mika.juuti@aalto.fi) and prof. N. Asokan.

_________________________________________________________________________________________________________________________________________________________________________________________

Privacy-preserving machine learning predictions using trusted hardware ML & SEC

Machine learning models hosted in a cloud service are increasingly popular but risk privacy: clients sending prediction requests to the service need to disclose potentially sensitive information. In this topic, we explore the problem of privacy-preserving predictions: after each prediction, the server learns nothing about clients’ input and clients learn nothing about the model.

We have an existing oblivious neural network transformation framework called MiniONN [1], which fulfills the functionality above. However, it has two drawbacks: it involves a large overhead compared with non-private solutions; and the server cannot perform any filtering on the input, i.e., cannot meter the use of the service based on the type of input. We aim to solve this two problems by leveraging trusted hardware present on many modern-day chipsets.

Requirements: C programming, basic understanding, machine learning and cryptography. Good software engineering skills.

[1] Liu, Jian, et al. "Oblivious Neural Network Predictions via MiniONN transformations."

[2] https://github.com/onnx

For further information: Please contact Jian Liu (E-mail: jian.liu@aalto.fi),  Mika Juuti (E-mail: mika.juuti@aalto.fi) and prof. N. Asokan.

______________________________________________________________________________________________________________________________________________________________________________________________


Smart grid project at EPFL: real-time control of electric grids using COMMELEC framework OTHER


Description


The proposed work is in the context of smart grid project at EPFL [1]. More specifically, the topic of the internship is related to the real-time control of electric grids using COMMELEC framework [2]. COMMELEC uses multiple software agents to steer the flexible resources of the grid so that we can achieve a given objective while optimally using different resources in the grid and always keeping the grid in safe operating limits. 


The successful candidate will develop a software agent for uncontrollable loads in the context of COMMELEC framework. This software agent should forecast the load for the next cycle of COMMELEC where a cycle of COMMELEC typically lasts 100 milliseconds. In fact, the software agent will compute the prediction interval for next load forecast at different confidence levels for the next cycle of the COMMELEC, i.e., every 100 milliseconds. The next forecast is calculated based on the historical load data and/or real-time measurements from the Phasor Measurement Units (PMUs) [3]. 


For above load forecasting, various methods can be used, e.g., time series analysis using ARMA models [4] or CEQ (cluster-extract-quantiles) [5]. The master thesis in [5] shows that the CEQ method performs better than forecast using ARMA models. The goal of this internship is first to do a preliminary analysis for finding a forecasting method that performs better than what is proposed in [5]. If no other better method is found, the student will develop a software agent in C++ using CEQ method as is proposed in [5]. In all cases, one of the challenge to develop this software agent will be its design. This is because it will receive the new measurements from the PMUs every 20ms and the forecasting should be done at a very fast pace (each COMMELEC cycle is 100 milliseconds).




Qualifications: 


- Basic knowledge of Machine Learning techniques


- Proficiency in C++ Software Development


- Capability to (1) work independently and (2) timely deliver the given assignments




References:


[1] https://smartgrid.epfl.ch 


[2] https://smartgrid.epfl.ch/?q=control


[3] https://smartgrid.epfl.ch/?q=monitoring


[4] https://en.wikipedia.org/wiki/Autoregressive%E2%80%93moving-average_model


[5] http://zeus3.lkn.ei.tum.de/talks/studentwork.php?id=1962&L=0




For more information on the topic of the internship and for further questions, please contact: Jagdish Achara jagdish.achara@epfl.ch 

_______________________________________________________________________________________________________________________________________________________________________________________________________________

Detecting troll-users from discussion forums ML & SEC

Professional trolls regularily post on discussion forums with a hidden agenda of advancing a particular position, which does not necessarily reflect their true opinion. They can be used by institutions like governments or corporations in attempts to affect people's opinions and behaviour, up to and including election results. This project involves applying machine learning to detect troll users on either Finnish or English discussion forums (depending on the participant's language proficiency). The project requires an understanding of natural language processing (NLP) and machine learning methodologiesm, as well as programming skills in some language(s) usable for these (e.g. Python or Java). 

Requirements: programming skills in some language usable for NLP and machine learning (e.g. Python, Java), basic understanding of machine learning.

Nice to have: basic understanding of NLP.

Resources:

[1] Erik Cambria, Praphul Chandra, Avinash Sharma, and Amir Hussain. Do not feel the trolls. In Proceedings of the 3rd International Workshop on Social Data on the Web,  2010.

[2] Patxi Galn-Garcia, JosGaviria de la Puerta, CarlosLaor den Gmez, Igor Santos, and PabloGarca Bringas. Supervised machine learning for the detection of troll profiles in     Twitter social network: Application to a real case of cyberbullying. In lvaro Herrero, Bruno Baruque, Fanny Klett, Ajith Abraham, Vclav Snel, Andr C.P.L.F. de Carvalho, Pablo    Garca Bringas, Ivan Zelinka, Hector Quintin, and Emilio Corchado, editors, International Joint Conference of Ad- vances in Intelligent Systems and Computing, volume 239,     pages 419–428, 2014.

[3] Todor Mihaylov, Georgi D. Georgiev, and Preslav Nakov. Finding opinion manipulation trolls in news community forums. In Proceedings of the 19th Conference on Computational Language Learning, pages 310–314, 2015.

[4] Chun Wei Seah, Hai Leong Chieu, Kian Ming A. Chai, Loo-Nin Teow, and Lee Wei Yeong. Troll Detection by Domain-Adapting Sentiment Analysis. In Proceedings of the 18th    International Conference on Information Fusion, pages 792–799, 2015.

For further information: Please contact Tommi Gröndahl (E-mail: tommi.grondahl@aalto.fi) or Mika Juuti (E-mail: mika.juuti@aalto.fi) and prof. N. Asokan.


  • No labels