GSoC 2016 – Rumal take 2

The Honeynet Project was accepted once again as a Mentoring Organization for Google Summer of Code 2016, and we are extremely proud of this.

Rumal was developed during GSoC 2015 by Tarun Kumar, and is now in its early alpha stage. We are also working on a Dockerized version to make things even simpler to install and test.

As you may know, Rumal is composed by two separate projects, a front-end and a back-end. You can find a detailed description about both of them in this other blog post, and a picture of Rumal’s general architecture can be found below.


Both front-end and back-end still need a lot of work before they can be used in production: here are some points we would like to investigate during GSoC 2016.

  • Both front-end and back-end should avoid using two different databases each (MongoDB for Thug’s results and a relational database for Django’s users and models). We should consider dropping the relational DB and using Mongo for authentication/authorization. Some further readings here and here.
  • Communication between front-end and back-end now happens through a set of rest APIs; this should be changed to use some messaging queue system such as RabbitMQ or Redis. This would avoid polling loops with sleep times, which introduce a lot of delay.
  • The enrich daemon in the front-end project supports plugins. The current interface for plugins is powerful, but it may be improved both in the python part and in the visualization part, giving plugin writers more control about how their data is generated and shown to the end user.
  • The enrich daemon should also be able to process multiple tasks in parallel. Processes should be used instead of threads whenever possible.
  • The backend daemon run_thug is now using the subprocess Python module to launch Thug in a dockerized environment. While this works pretty well, we may want to look into Docker’s own python API instead.
  • Visualization of analysis results should be thoroughly tested with real analysis data, and adjusted to give a better idea of what the real outcome is. (e.g. is that site actually malicious or suspicious? what are the key indicators of compromise?)
  • The whole code base – specially the JavaScript code used in the front-end’s GUI – needs some rethinking, some clean-up and a better exception handling.
  • Thug supports analyzing sites through proxies. Proxy support was kept into account while developing Rumal’s GUI, but it still needs to be implemented in the code.
  • Rumal was designed to serve both as a stand-alone GUI to be run on a single machine, but also as a sort of social network or collaboration tool where multiple users can share analyses, comments, etc. This part still needs to be fully implemented.
  • One of the main features we thought about while designing Rumal was the ability to search among different analyses to find similarities and differences, but also to compare two or more analysis results in a visually descriptive way. This part still needs to be fully implemented, too.

Also, the lack of a structured documentation is a problem that may stop people from trying out Rumal and contributing to it. We definitely need to address this in GSoC 2016.

Posted in Uncategorized | Leave a comment

GSoC 2015: Introducing YAPDNS


This post is mainly intended for GSoC 2015 students who might want to consider contributing to a new Honeynet Project tool called YAPDNS (Yet Another Passive DNS). If you are interested in contributing to YAPDNS outside of GSoC, you may find this interesting as well.

Passive DNS

A passive DNS is a sensor that extracts data from DNS queries and responses and forwards them to a central database which processes and stores it. Passive DNS data is extremely useful to understand the historical evolution of a DNS entry and, more in general, the associations between IP addresses and DNS names. This kind of information comes pretty handy when dealing with malicious activities such as Command & Control Servers, Exploit Kits and so on, as it can help you find clusters between different incidents. Some examples:

  • What are all the IP addresses that have been associated to a particular DNS name?
  • What are all the DNS names associated to a given IP address?
  • Is this domain being resolved in different ways depending on the client’s geolocation? (e.g. is this domain a CDN?)
  • Does this domain have a Fast Flux behavior?

Existing implementations

The most famous passive DNS system is probably DNSDB, initiated by the Internet Systems Consortium (ISC) and then acquired by Farsight Security. Farsight Security is a commercial entity but is committed to share security-related telemetry data with security industry partners and academic researchers at nominal or non-discriminatory subscription rates.

VirusTotal also has its own passive DNS system, which is free for non-commercial use. However, if I don’t get it wrong, this database is mostly based on data extracted from VirusTotal’s own activities (e.g. queries performed by their sandboxes while analyzing submitted samples, etc.) and, as such, is not aimed at being a complete passive DNS database, but rather a directory of known malicious domains.

Other similar databases exist, but they are either non-free or not complete enough.

Passive DNS data is everywhere

The concept behind YAPDNS is that you don’t actually need to analyze direct DNS queries and responses to get passive DNS data. Extracting this piece of information from other sources is often more than enough.

Want an example? Well, suppose you are using a SIEM (e.g. Splunk, ELKS or ELSA) to collect logs from your company’s proxy server. Those logs almost always contain an association between an IP address (the destination IP) and a domain name (the web site’s name). In this case, you can safely assume that any domain associated to a IPv4 address corresponds to a “A” record on the DNS, while a domain to IPv6 association should correspond to a “AAAA” record.

Now, suppose you’re also collecting logs from your company’s mail server: again, if you see an outgoing SMTP flux to a certain IP address, you can assume that a “MX” record exists that associates that IP address with the mail recipient’s domain name.

You could point out that it would be much simpler (and probaby more accurate) to point your DNS server’s logs to the same SIEM and use direct DNS data to build your database. That’s correct, of course, but there are a number of times when you simply can’t access the DNS server’s logs and/or those logs aren’t complete enough.

But, of course, there’s much more than that. Assume you are collecting data from both direct and indirect sources. Assume you’ve built a huge database with that data. Think about your own web site’s domain name: you know it’s a “static” A record; you know it’s being resolved to the same IP address in all the world (regardless to where a client comes from) and you haven’t changed it in a while. Of course, direct DNS entries (e.g. direct monitoring of DNS query logs) can confirm this. Now, assume one of your passive sensors collecting indirect DNS data from a proxy’s logs is seeing your domain name associated to a completely different IP address: that’s an anomaly that could mean that the client’s hosts file was altered, or that it’s using a DNS service that is serving an altered record. When analyzing security-related data, this kind of information is extremely important.


Our idea about YAPDNS’s architecture is that of having multiple connectors that should be able to extract the relevant data (basically: IP address, domain name and timestamp – and hopefully record type like A, AAAA, MX etc) from various sources. Then, this data should be forwarded to a YAPDNS Server that should process and store it for further reference. On top of that, a Web GUI (hopefully based on Django) and a set of APIs (hopefully made with TastyPie) should make that data available to humans and to other tools.

YAPDNS should also be integrated with HPFriends, which is a tool developed by The Honeynet Project to allow different individuals and groups to share the data they collect with their analysis systems.


Simply said, connectors should be able to collect passive DNS data from as many sources as possible. For example, think about a host running syslog-ng to receive logs from a web proxy. Syslog-ng lets you define a set of parsers (see PatternDB) to be applied to incoming logs; then you can pass those already parsed logs to an external script that would be our connector. This connector takes the relevant data out of the parsed log and sends it to the central processing component.

Another connector could run as a scheduled job (e.g. from crontab) to periodically retrieve logs from SIEMs like Splunk or ELK Stack or even ELSA, extract the relevant info from those logs and then sends them to the processing component.

Of course, connectors should also exists for direct DNS data such as DNS logs, Bro IDS‘s logs, and it should be possible to distinguish between direct and indirect data.


The central processing component would essentially take the input data – that has already been normalized by the connectors – and store it to a database. Data should also be enriched with metadata, such as suspicious behaviours detected by some specific rules, e.g.:

  • if a domain is associated to a lot of different IP addresses that change every few seconds, then it may be a fast flux domain;
  • if the same domain is resolved with different IP addresses based on the geolocation of the client making the query, then it could be a CDN;

Another kind of metadata that could be associated with DNS entries is WHOIS information for both IP addresses and domains.

The YAPDNS server should also be able to forward the collected information, along with any other useful metadata, to external systems such as HPFriends. Communication with other projects and software may use the Common Output Format proposed by this draft on IETF.


The centralized GUI would be a Django app with user authentication; the HTML/JavaScript part could be made with Bootstrap, jQuery and other similar de-facto standards to keep things nice and simple. It would let you search the data performing tasks such as:

  • find the history of all IP addresses associated with domain
  • find the history of all domains associated with IP address X.Y.Z.W


If you feel like getting involved in YAPDNS’s development, either as a GSoC student or as a contributor, you should make yourself familiar with those concepts:

  • Passive DNS systems (general concepts and existing implementations)
  • Syslog-ng and PatternDB
  • SIEMs (Splunk, ELKs, ELSA, etc)
  • Python, Django, TastyPie, Bootstrap, jQuery: that’s what most of the project will be made with

Should you have any questions, please feel free to subscribe to the Honeynet Project’s GSoC mailing list, or ping us via Twitter: @PietroDelsante and @a_de_pasquale.

Posted in GSoC, Honeynet Project, News | Tagged , ,

GSoC 2015: Introducing Thug’s Rumal


This post is mainly intended for GSoC 2015 students who might want to consider contributing to a pretty new Honeynet Project tool called Rumal. If you are interested in contributing to Rumal outside of GSoC, then you will most probably already know Thug, so you can safely skip the first part.


Before talking about Rumal, we’d better spend a few words about Thug – if you already know it, you can skip this part. As you know, Thug was born as a tool to study exploit kits [1], and it does so by emulating a real browser completed with a set of plugins like Adobe Reader, Flash and Java. In a few words (Angelo, please don’t kill me), when you point it to a compromised web page, it “crawls” it and starts fetching and executing any internal or external JavaScript, following redirects and downloading files just like a browser would do. When if encounters some files it cannot analyze by itself (like Flash, Java and PDF), it passes them to some external tools. Thug’s results are then collected in a variety of formats, among which there’s a MongoDB database.

Unfortunately, while it’s extremely good at doing what it does, Thug still has one big problem: you need to be a real h4x0r to be using it. Its results can be quite hard to read and most people don’t even have the skills to understand them, so they only see if they can get the exploit kit’s binary payload (the malware’s *.exe file) and, if they can’t, they move along and say the page is not malicious, or maybe the exploit kit is inactive [2]. That’s where a good web GUI would come handy, and that’s exactly what Thug’s Rumal was born for: there’s plenty of information that could be extracted from Thug’s results and that could point you to the conclusion that the page is actually malicious and you’d better check your Windows PC twice 🙂

Thug’s Rumal

Now, let’s come to Rumal. We have been developing it for a few weeks now, so it’s still a very young project. We sitll haven’t released the code on GitHub because we still have to polish it a bit, but we will do it as soon as possible. EDIT 2015/03/15: We finally released Rumal’s source code on GitHub! Please see below for the link!

Rumal can be divided into two parts: let’s call it the front-end and the back-end.


The front-end consists of a Django project providing a web site that lets you submit new URLs (let’s call them “tasks”) and browse the results. Some key points of it are:

  • Data visualization: the main goal of this website is that of displaying Thug’s own results (e.g. the data saved in the MongoDB collections, or in the JSON files if you enable file logging) in a way that is both clear to understand and easy to browse and search. This involves studying basic data visualization problems such as contextual graphs, code highlighting, etc.
  • Metadata from external sources: Rumal should also provide users with a set of complementary tools, such as the ability to search a given domain or IP address in external services such as VirusTotal, DomainTools, UrlQuery and so on. Whenever possible, those lookups may be performed automatically by the application itself, and the search results may be shown along with Thug’s own data (e.g. GeoIP information could be drawn in a map).
  • Correlation: different analyses should be correlated and common patterns should be evidenced. This would involve clustering similar behaviours and recognizing exploit kits, but another key point may be that of identifying their spreading techniques and infection campaigns (e.g. how many analyses landing to the same exploit kit start from the same vulnerable CMS engine?)

Rumal is being designed as a sort of social network, where people can share their results, their comments and even their settings with other users or groups. Some random thoughts:

  • Integration: Right now, we are using Django’s builtin authentication and authorization system, but it would be nice to use OAuth.
  • Groups and circles: Groups are sets of users, just like in a Linux or Windows system; each user knows what groups he’s in, and he also knows who else is in the same group. Circles, as in Google+, are a different concept: they are private to each user, and only the circle’s owner knows who’s inside it. You know you are in somebody else’s circles, but you don’t know what’s the name of the exact circle he/she put you in, and you don’t even know who else is in the same circle. Right now, Rumal only has groups; it would be nice to introduce circles as well.
  • Social features: many other social features might be added, such as karma score, chats or forums, etc.


The back-end is a sort of daemon that does the real processing by running one or more Thug instances. There are quite a lot of issues to solve with this, and they mostly involve some of Thug’s internals.

For example, building a single-process, single-thread daemon that runs tasks in sequence is pretty straightforward. But, of course, we want Rumal to be able to process more than one task at once, for example by running different threads or processes and spawning one Thug instance per thread/process. Unfortunately, this is probably going to be a pain due to the fact that one of Thug’s core components is the Logger instance, which is treated as a sort of “global variable” maintained by the interpreter itself. You can access the same Logger instance from any of Thug’s components and, since the Logger is a plain Python object, you can append arbitrary properties and data to it, and retrieve its values from any other component. While this trick is extremely clever and handy, when you have more than one instance of Thug running (even if they run on different Python processes), those instances may be accessing the same Logger object and some race conditions may arise. This problem needs to be investigated and addressed. Possible technologies that may (or may not) help with that: virtualenv, Docker, …


If you feel like getting involved in Rumal’s development, either as a GSoC student or as a contributor, you should make yourself familiar with those concepts:

  • Exploit Kits: what they are, how they work, how to analyze them
  • Thug: use it as much as you can, try to understand its output and study its internals
  • Python, Django, TastyPie, Bootstrap, jQuery: that’s what the Rumal’s woven is composed with

Should you have any questions, please feel free to subscribe to the Honeynet Project’s GSoC mailing list, or ping us via Twitter: @PietroDelsante and @a_de_pasquale.

=== EDIT 2015/03/15 ===

You can find Rumal’s source code on GitHub now!

=== END EDIT ===

Foot Notes

[1] I suggest you this article about exploit kits. Basically, exploit kits are normally used in drive-by download attacks, where a malicious user compromises a huge set of vulnerable sites (see here for an example). When a user loads the compromised page, some JavaScript code redirects them to the EK, that then serves one or more exploits such as malicious Flash, Java or PDF files. The exploit is not the malware by itself: its only purpose is that of downloading the EK’s payload (the real malware) and executing it, thus infecting the victim. Exploit kits can also be used in spam attacks, where users get unsolicited emails containing a link that, if opened, redirects them to the EK’s landing page. Another attack that makes use of EKs is malvertising, where a malicious banner (often written in Flash) is pushed to a legitimate banner circuit that displays it on legitimate web pages, infecting their visitors.

[2] Most EKs only last a few days, sometimes only a few hours, then they move to a new domain and a new set of IP addresses: this way they can avoid being tracked, blacklisted or shut down by security firms and researchers.

Posted in GSoC, Honeynet Project, News | Tagged , ,

Thug and the art of web client tracking inspection

A few months ago I read the paper “Technical analysis of client identification mechanisms” [1]. The paper is really interesting and it is really worth investing your time and reading. Just a brief excerpt from the abstract:

“In common use, the term “web tracking” refers to the process of calculating or assigning unique and reasonably stable identifiers to each browser that visits a website. In most cases, this is done for the purpose of correlating future visits from the same person or machine with historical data. Some uses of such tracking techniques are well established and commonplace. For example, they are frequently employed to tell real users from malicious bots, to make it harder for attackers to gain access to compromised accounts, or to store user preferences on a website. In the same vein, the online advertising industry has used cookies as the primary client identification technology since the mid-1990s. Other practices may be less known, may not necessarily map to existing browser controls, and may be impossible or difficult to detect. Many of them – in particular, various methods of client fingerprinting – have garnered concerns from software vendors, standards bodies, and the media.”

A few weeks ago I had a private chat with a dear friend of mine currently involved in the Trackography project [2] and developing his own tool for such purposes [3]. During the conversation, the idea of using Thug for analyzing if a website makes use of some of the techniques described in [1] and to which extent emerged. The idea of using an honeyclient for a so complete different and useful purpose was really exciting for me and so I started thinking about the best way to do it. In order to do it I had to replace httplib2 with requests at first since requests allows me to collect more details about a typical HTTP session. After that I started a new branch (which is still not public) which right now implements just a first single test as shown below.

$ python --web-tracking
[2015-01-27 11:39:36] [MongoDB] Analysis ID: 54c76ae8d637083631c9a7ea
[2015-01-27 11:39:36] [window open redirection] about:blank ->
[2015-01-27 11:39:37] [PRIVACY] Cookie expiring at 2017-01-26 11:39:37 (more than 365 days from now)

Right now it’s not that useful but it will be once all the metrics will be implemented. Stay tuned because Thug is turning to be an honeyclient with steroids in the next weeks!

[1] Technical analysis of client identification mechanisms
[2] Trackography project
[3] Trackmap

Posted in Thug | Tagged

Thug 0.6 released!

Thug 0.6 was released just a few hours ago. The most important change introduced during the 0.5 branch was a complete redesign of the logging infrastructure which is now completely modular. This makes adding (or removing) new logging modules extremely easy.

I did this change for a couple of reasons. The first one is that the logging code before Thug 0.5 was developed without a proper design but just adding the modules as soon as I needed them. I usually hate this approach so it would be enough to justify a complete redesign. But there was one more reason. I was aware that a few persons out there were implementing their own logging modules and binding them in some really awful ways to the main code (someone said plugins?). I spent a lot of time in documenting such changes. For these reason I will not dive into details in this post. But trust me. Extending Thug logging with your own modules should be an easy task now. Hopefully. Let me add that additional logging modules would be really appreciated so if you think your cool module should be included in the source tree please feel free to contact me.

Moreover I worked a lot in order to improve the reliability and the performances of the analyses. Sometimes this required just little changes, sometimes not. But I am really satisfied about such changes and the improvements they produced.

I have great plans for the 0.6 branch. First of all I will continue focusing on reliability and performance improvements (probably I never stopped doing it from the first release). Moreover I would love working on the distributed analysis approach which we started experimenting during the Google Summer of Code 2013 and on the client tracking mechanisms detection.

Stay tuned because I just started having fun!

Posted in Projects, Thug

Malware-serving theaters for your android phones – Part 2

In this post I will analyze the Android APK files that my friend Pietro Delsante from the Honeynet Project Sysenter Chapter talks about in his previous post (thank you Pietro). The files are all named “video.apk” and these are the MD5 and SHA256 hashes:

video.apk 10859e82697955eb2561822e14460463 a36ecd528ecd80dadf3b4c47952aede7df3144eb9d2f5ba1d3771d6be2261b62

video.apk 91f302fd7c2d1b8fb54248ea128d19e0 8e0a2f6b7101e8caa61a59af4fdfc5b5629b8eac3a9aafcc1d0c8e56b4ddad15

video.apk f6ad9ced69913916038f5bb94433848d 4c7c0bd7ed69614cb58908d6a28d2aa5eeaac2ad6d03cbcad1a9d01f28a14ab9

The three APKs are almost identical: they share the same certificate and much more (I will cover the differences later). I started by having a look at the first sample 10859e82697955eb2561822e14460463, this is the content of the AndroidManifest.xml file:


As you can see there are the following permissions (details from the official Android documentation):

  • android.permission.SEND_SMS, which allows the app to send SMS messages;
  • android.permission.INTERNET, which allows the app to open network sockets;
  • android.permission.RECEIVE_SMS, which allows the app to monitor incoming SMS messages, to record or perform processing on them;
  • android.permission.READ_PHONE_STATE, which allows read only access to phone state.

This is consistent with the name that the main antivirus vendors use for this kind of malware (from VirusTotal [1][2][3]):


Having a look at the application code we see that it is split into two parts:

  • version.eleven.MainActivity is the class that is run when the application is started from the Android launcher;
  • version.eleven.SmsReciver (sic) is a subclass of BroadcastReceiver, i.e. it runs when there are incoming SMS messages.

This is the full structure of the APK after decoding it with apktool:

apktool tree

There are some resources that look promising, such as the html hierarchy inside “/assets” and the file “/res/raw/settings.json”, whereas the files inside “/res/values/*.xml” do not yield anything interesting. The app name is Еро Видео (Russian for “Ero Video”) and we also have a pretty windows-like icon:


Looks promising, we are dealing with an app that masks itself as a porn video. But poor Pietro wasn’t looking for some porn, he was headed to the theater to see a show with friends! 😛

At this point before analyzing the code I chose to let the app manually run in a sandbox, to check whether it was a real video or not to test what is presented to the user, which kind of network traffic it generates, what SMS messages are sent, etc. The following is a slightly NSFW video of the execution of the app:


As a side note, there also seems to be some debug-like output on the ADB logcat:



Time to have a look at the code… it has been decoded with a mixture of apktool/dex2jar/jad and then source code has been slightly fixed using manual bytecode inspection. What follows is the MainActivity that is executed when the app is launched. While an overlay progress dialog is shown, in the background are loaded some settings. Then a WebView, which simply stated is a web browser, is started as the application content view i.e. what the user sees; it also has a JavaScript interface called “webapi” set up by the method “addJavascriptInterface()“. The WebView is opened on the URL of the asset page “/assets/html/index.html”, and as soon as the page loads the overlayed progress dialog is hidden and some JavaScript code is executed with the Android version as a parameter.


The calls to “Settings.load()” and “.save()” get and put some data (first, sentSms, time) from the Android shared preferences storage:


Here’s an example of the settings that might be saved inside “/data/data/version.eleven/shared_prefs/settings.xml”:

<?xml version='1.0' encoding='utf-8' standalone='yes' ?>
<boolean name="first" value="false" />
<long name="time" value="0" />
<boolean name="sentSms" value="false" />


The method “MainActivity.loadSettings()” loads instead the file “/res/raw/settings.json” with “Functions.loadAndDecode()” and checks its content against the International Mobile Subscriber Identity (IMSI) retrieved through “Functions.getImsi()”.



The IMSI is a 15-digit number associated with all cellular networks used to identify a single user of the network. Its first five or six digits are the concatenation of the Mobile Country Code (MCC) and the Mobile Network Code (MNC). The MCC+MNC tuple is used to uniquely fingerprint a mobile phone operator and its country of operation. Here we are dealing with a dictionary of operators (op*) and, for every operator, one or more MCC+MNC (codes) along with some number+text (items).


The “settings.json” file contains operators (sometimes MCC+MNC, sometimes only the MCC) from the following countries: Russian Federation, Ukraine, Lithuania, Azerbaijan, Latvia, Estonia, Armenia, Israel, Austria, Belgium, Bulgaria, Belarus, Switzerland, Cyprus, Czech Republic, Germany, Denmark, Spain, Finland, France, Hong Kong, Croatia, Hungary, Iraq, Jordan, Cambodia, Kuwait, Luxembourg, Montenegro, Macedonia, Malaysia, Netherlands, Norway, Portugal, Qatar, Serbia, Saudi Arabia, Slovenia, Slovakia, Taiwan, Kazakhstan, Poland. This file has some minor differences between the three analyzed APKs.


Now, for what concerns the HTML part, here’s the page source header. When the page is loaded a JS animation starts slideshowing some porn images (pic*.jpg), and in the meantime the Android method “Functions.callJsCallbackAndroidVersion(Build.VERSION.RELEASE)” calls the JavaScript function “androidVersion()” in order to set some kind of flag for Gingerbread devices.



When the user clicks on the blue play button, the JavaScript code “goNext()” is run. First, it hides some part of the HTML page (“#page1”) and shows another (“#page2”); that second part is slightly different between the three APKs, but all of them show a porn-related link (Ваша ссылка), a so-called password (Ваш пароль) and a GO button (Перейти). In addition, “goNext()” runs “sendSms()”, which in turn calls the Android method “WebApi.sendSms()” either directly or through the chain “prompt()”, “onJsPrompt()”, “textToCommand()”.



The function first performs a check in order to be sure that it was run successfully at least one day before, then starts a separate thread that runs “threadOperationRun()”. That function retrieves a “MainActivity.settings.smsList” list of number+text, previously populated using “settings.json”, then sends every message using “Functions.sendSms(number, text)”.


Every number to whom an SMS was sent is also saved into “MainActivity.settings.phoneList”, then the shared_prefs persistent settings’ timestamp is updated. This is a sample of the prefs after some SMS messages have been sent:

<?xml version='1.0' encoding='utf-8' standalone='yes' ?>
<boolean name="first" value="false" />
<long name="time" value="1389222812010" />
<boolean name="sentSms" value="true" />


Lastly, here is the code that runs when there are incoming SMS messages. To keep it short, if the phone number of the sender of the incoming message is in the list “MainActivity.settings.phoneList”, then the message is discarded.



To wrap up, we are dealing with a fake “Ero Video” porn app that sends paid SMS messages at most once a day, hiding any subsequent reply to these messages. It has no background service (and no persistence/self boot), and relies entirely on social engineering, i.e. the user has to open the app and click the play button. Moreover, as you can see in the video, the Android system shows a popup warning when the first SMS is sent, hopefully lowering the click-through rate.

For what concerns the fraud and/or monetization schema, I still need to understand the meaning of the text inside the messages. For example, on hxxp://zona-people[.]com/oferta.php (Russian) at the bottom of the page we can see that the numbers inside “settings.json” are billed 6.5€ / 9$ per SMS, but unfortunately no clue about the meaning of the text. Information and feedback are always welcome, you can find me on Twitter.

Posted in News | Tagged ,

Malware-serving theaters for your android phones – Part 1

Some nights ago I was heading to a local theater with some (non-nerd) friends. We did not recall very well the address, so I brought out my phone (LG Nexus 4 with Android 4.4.2 and Google Chrome) and googled for it. I found the theater’s official site and started looking for the contact info, when Chrome suddenly opened a popup window pointing me to a Russian web site ( urging me to update my Flash Player. I laughed loudly and showed them to my (again, totally non-nerd) friends saying that the site had been owned. One of them went on and opened the site with her own phone (Samsung Galaxy S Advance with Android 4.4.1 and the default Android WebKit browser). To make a long story short, after a few instants her phone was downloading a file without even asking her for confirmation. So: Chrome on my Nexus 4 was using social engineering to have me click on a link and manually download the file; Android’s WebKit on her Galaxy S Advance was instead downloading the file straight away: interesting! However, we were a bit late and we had to run for the comedy, so I did not even bother to see what the heck she had downloaded, I only made sure she hadn’t opened it. I thought it was just the usual exploit kit trying to infect PCs by serving fake Flash Player updates, seen tons of those. While waiting for the comedy to begin, I quickly submitted the compromised site to three different services, the first three ones that came to my mind: HoneyProxy Client, Wepawet and Unmask Parasites, then turned off my phone and enjoyed the show.

The day after, I decided to spend some minutes analyzing that exploit kit (you know, just in case…). First of all, the compromised site was made with Joomla 1.7, an older release that has a quite long list of security updates in its short history ( and is now deprecated in favor of Joomla 2.5. I wish I had access to that web server’s logs, those would be quite funny!

However, looking at the source code of the compromised pages, I saw that the malicious javascript was injected at the very beginning of the page:



As you can see (even if the image is cropped), the JavaScript is composed by two main IF clauses. The first one checks whether the User-Agent string may indicate a robot, in which case nothing is done; instead, if this looks like a real browser, the code calls a function that creates an iframe pointing to “ /?id=ifrm” and adds it to the page. Then, if the User-Agent string indicates this might be a mobile phone, the second IF clause also tries to use some basic JavaScript functions to trigger a full-page redirection to the same URL, but passing in a different parameter: “/?id=mob”. Uhm, sounds interesting: an exploit kit with some code specific to mobile phones, I had never seen that but maybe it’s only because lately I had been working on other topics.

I reported the breach to the site owners right after finishing the analysis, on December 30 and they answered on December 31 saying they would clean it as soon as possible. This afternoon (January 7) I checked and the site was clean, but tonight it’s compromised again, so it looks like the owners did not patch the vulnerability, and the exploit is probably being spread in a mechanized way and a quick Google search seems to confirm this hypothesis, as the malicious code was injected in more than 82,100 different pages, and those are probably only the ones that did not get compromised since the inject failed and the javascript code is showing up as text instead of being executed:

Well, after that, I looked at the results of the three scans I had ran the night before. To my surprise, there was almost nothing in them:

  1. HoneyProxy Client did show a connection towards “” which was creating an iframe pointing to “” which contained some obfuscated javascript, however no exploit was run or, at least, no interesting file (PE EXE, PDF, SWF or the like) was downloaded;
  2. Wepawet died on me several times while trying to run it, so I gave it up;
  3. Unmask Parasites was tagging the site as suspicious as it had found some javascript code outside the proper <script> tags.

And that was all. So, I decided to run the site through Thug with the default personality (winxpie60) and – man! – that was deceiving! Nothing found. Absolutely nothing. Not even a single tiny call to a .ru domain or anything of the like. The only external site was, which was a legitimate content of the theater’s site.

Fortunately, Thug’s author Angelo “Buffer” Dell’Aera (our Boss, our Leader, our Shining Star) was wise enough to provide his wonderful tool with an awesome set of different personalities: if the exploit kit did not like Internet Explorer 6, maybe I may fool it with a Galaxy S II with Google Chrome 18 and Android 4.0.3, since it was checking for mobile phones. Guess what, that did the trick! This time, after a few seconds, Thug got redirected to ““, which in turn pointed to ““, then to “” and ““, from which three different APKs were downloaded.

For those interested, this is a small excerpt of Thug’s JSON logs:

“connections”: [
“source”: “hxxp:// http://www.[compromised_site].com /”,
“destination”: “hxxp:// /?id=mob”,
“flags”: {},
“method”: “href”
“source”: “hxxp:// http://www.[compromised_site].com /”,
“destination”: “hxxp:// /?id=mob”,
“flags”: {},
“method”: “window open”
“source”: “hxxp:// /?id=mob”,
“destination”: “hxxp:// /lpadultbill/d.php?id=u7be70c982f0a1226ae890bc4d7e3dfe9/”,
“flags”: {},
“method”: “meta”
“source”: “hxxp:// /lpadultbill/d.php?id=u7be70c982f0a1226ae890bc4d7e3dfe9/”,
“destination”: “hxxp:// /lpadultbill/d.php?id=u7be70c982f0a1226ae890bc4d7e3dfe9%2F”,
“flags”: {},
“method”: “http-redirect”
“source”: “hxxp:// /tmpsrc/d586495364701f9ec770e3b9df2df318/video.apk”,
“destination”: “hxxp:// /lpadultbill/d.php?id=u7be70c982f0a1226ae890bc4d7e3dfe9/”,
“flags”: {},
“method”: “window open”
“source”: “hxxp:// /lpadultbill/d.php?id=u7be70c982f0a1226ae890bc4d7e3dfe9/”,
“destination”: “hxxp:// /lpadultbill/d.php?id=u7be70c982f0a1226ae890bc4d7e3dfe9%2F”,
“flags”: {},
“method”: “http-redirect”
“source”: “hxxp:// /tmpsrc/d586495364701f9ec770e3b9df2df318/video.apk”,
“destination”: “hxxp:// /lpadultbill/d.php?id=u7be70c982f0a1226ae890bc4d7e3dfe9”,
“flags”: {},
“method”: “href”
“source”: “hxxp:// /tmpsrc/d586495364701f9ec770e3b9df2df318/video.apk”,
“destination”: “hxxp:// /lpadultbill/d.php?id=u7be70c982f0a1226ae890bc4d7e3dfe9”,
“flags”: {},
“method”: “window open”
“source”: “hxxp:// /lpadultbill/d.php?id=u7be70c982f0a1226ae890bc4d7e3dfe9”,
“destination”: “hxxp:// /lpadultbill/d.php?id=u7be70c982f0a1226ae890bc4d7e3dfe9”,
“flags”: {},
“method”: “http-redirect”

The same result could be achieved by selecting the iPad personality (ipadsafari7) or any other Android one (galaxy2chrome18, galaxy2chrome25, galaxy2chrome29), so it looks like the Exploit Kit is not really making any difference between the actual operating system run by your phone, it’s always serving you an Android app.

The three APK files are actually the same app, with three different small changes in their configuration to talk to three different Command&Control servers, but we’ll talk about this in a later post. For now, we’ll only say they’re all three named “video.apk” and that their MD5 sums are 10859e82697955eb2561822e14460463, 91f302fd7c2d1b8fb54248ea128d19e0 and f6ad9ced69913916038f5bb94433848d.

To sum up things, in this post we’ve seen about a peculiar Exploit Kit that’s being actively spread by some mechanized mean and has already compromised several thousands sites. The exploit kit is behaving in a quite peculiar way as it seems to have been designed with special attention to mobile users (that are currently the only ones that get infected by it), and it’s distributing some malicious APKs that are (more or less) well recognized by AV vendors on VirusTotal (23/47). Last but not least, Angelo “Buffer” Dell’Aera confirmed that it’s the first time he’s seen APKs being distributed that way by an exploit kit, and – to his pride – Thug is able to get them all!

Stay tuned for some further analysis of those APKs by my friend and fellow Sysenter Chapter contributor Andrea De Pasquale!

Posted in Exploits, News, Thug | Tagged , , , ,