The activities in the work package “Data Analytics Technologies for CloSer” proceed actively on several different tracks, related to the various scenarios described in work package 1.

Android Malware Detection (Scenario 1)

Arcada, Aalto, and F-Secure have continued their collaboration to research more reliable machine learning methods for identifying malicious Android application packages. Sparse random projections have been found to be effective at reducing the dimensionality of the problem while retaining detection accuracy [1]. A more complete overview of the methodology and models is published in Computers & Security [2]. A demonstration of the system was presented at the CloSer workshop in April 2017.

F-Secure has deployed a malware detector based on this research, integrated in their production systems. The detection accuracy of the new model has met expectations while maintaining a negligible false positive error rate.

Arcada and F-Secure have begun to explore the possibility of incorporating an anomaly detection system in the malware classifier, researching whether such unsupervised learning methods can provide additional insight into the reliability of the classifier on previously unseen data.

User and Traffic Profiling for Anomaly Detection (Scenario 5)

User/End-point Profiling

F-Secure has collected a sizeable dataset of activities on various computers, providing a unique and promising opportunity to investigate to what degree anomalous behaviour can be identified from such observations using machine learning technologies. However, due to the sensitive nature of the data, it can only be accessed on-site. Starting in early December 2017, Arcada researchers have begun regular intra-project visits to the F-Secure premises to explore the data and more clearly define relevant research questions.

Traffic Profiling

Nokia researchers have evaluated optimization and performance of a differential anomaly detection model for SDN enabled networks [3,5]. In addition, the problem has been studied from the perspective of an IoT robot security use case [4,6].

A public demonstration of the differential anomaly detector for SDN enabled networks has been presented at the IFIP/IEEE Symposium on Integrated Network and Service Management [7] as well as at the CloSer workshop in April 2017.

Image Analysis for Website Reputation Services (Scenario 6)

For web content analysis, the information contained in image files has been identified as a valuable source that is currently not used to its full extent. In order to categorize images for this purpose, Arcada researchers have developed a computer vision prototype using deep learning [12]. A journal paper describing the methodology and more detailed results will be ready for submission soon [13].

F-Secure has begun to explore how the new models can optimally be integrated into existing implementations of content filtering services, in collaboration with Arcada.

Related to the image classification task, Arcada has additionally focused on developing new strategies for dealing with noisy labels based on unsupervised learning [8] and active learning [9].

Additional Activities

Arcada researchers have continued to study further how sparse random projections can be used for large scale learning tasks [10]. The paper presents results using data related to the Android malware detection use case (Scenario 1), but the methods can also be applied to the other scenarios, and a wider range of problems involving large unstructured data. Related work [11] has been presented at the ELM 2017 conference (where Kaj-Mikael Björk acted as international liaison) to develop international collaboration around the subject.

Contributions published / to appear:

  1. Luiza Sayfullina, Emil Eirola, Dmitry Komashinsky, Paolo Palumbo, Juha Karhunen, "Android Malware Detection: Building Useful Representations", IEEE International Conference on Machine Learning and Applications (IEEE ICMLA'16), Anaheim, USA, 2016.
  2. Paolo Palumbo, Luiza Sayfullina, Dmitriy Komashinskiy, Emil Eirola, Juha Karhunen, "A pragmatic android malware detection procedure". Computers & Security, vol. 70, pp. 689-701, 2017.
  3. Monshizadeh, Mehrnoosh; Khatri, Vikramajeet; and Kantola, Raimo. “Detection as a service: An SDN application”. In 19th International Conference on Advanced Communication Technology (ICACT), Bongpyeong, pp. 285-290, 2017.
  4. Monshizadeh, Mehrnoosh; Khatri, Vikramajeet; Kantola, Raimo; and Yan, Zheng. “An Orchestrated Security Platform for Internet of Robots”. In proceedings of 12th International Conference, Green, Pervasive, and Cloud Computing (GPC), Italy, pp. 298-312, 2017.
  5. Monshizadeh, Mehrnoosh; and Khatri, Vikramajeet. “Mobile Virtual Network Operators (MVNO) Security”. In A Comprehensive Guide to 5G Security, Wiley Publishers, pp. 323-346, 2017.
  6. Monshizadeh, Mehrnoosh; and Khatri, Vikramajeet. “IoT Security”. In A Comprehensive Guide to 5G Security, Wiley Publishers, pp. 247-266, 2017.
  7. Monshizadeh, Mehrnoosh; Khatri, Vikramajeet; and Kantola, Raimo. “An adaptive detection and prevention architecture for unsafe traffic in SDN enabled mobile networks”. In IFIP/IEEE Symposium on Integrated Network and Service Management (IM), Lisbon, pp. 883-884, 2017.
  8. Anton Akusok, Emil Eirola, Yoan Miche, Ian Oliver, Kaj-Mikael Björk, Andrey Gritsenko, Stephen Baek, Amaury Lendasse, "Incremental ELMVIS for unsupervised learning", International Conference on Extreme Learning Machines (ELM2016), Singapore, 2016.
  9. Anton Akusok, Emil Eirola, Yoan Miche, Andrey Gritsenko, and Amaury Lendasse “Advanced Query Strategies for Active Learning with Extreme Learning Machine”. In European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN), 2017.
  10. Anton Akusok, Emil Eirola. “Comparison of Classification Methods for Very High-Dimensional Data in Sparse Random Projection Representation” Submitted to Neurocomputing ­– Special Issue on Advances in Data Representation and Learning for Pattern Analysis, under review.
  11. Anton Akusok, Emil Eirola, Kaj-Mikael Björk, Amaury Lendasse, "Extreme Learning Tree". International Conference on Extreme Learning Machines (ELM 2017), Yantai, China, 2017.
  12. Leonardo Espinosa Leal, Kaj-Mikael Björk, Amaury Lendasse, Anton Akusok, "A Web Page Classifier Library Based on Random Image Content Analysis Using Deep Learning". Submitted to PETRA'18.
  13. Anton Akusok, Leonardo Espinosa Leal. “Full Page Web Content Analysis”. To appear.


  • No labels