GSoC 2015: Introducing Thug’s Rumal

Disclaimer

This post is mainly intended for GSoC 2015 students who might want to consider contributing to a pretty new Honeynet Project tool called Rumal. If you are interested in contributing to Rumal outside of GSoC, then you will most probably already know Thug, so you can safely skip the first part.

Thug

Before talking about Rumal, we’d better spend a few words about Thug – if you already know it, you can skip this part. As you know, Thug was born as a tool to study exploit kits [1], and it does so by emulating a real browser completed with a set of plugins like Adobe Reader, Flash and Java. In a few words (Angelo, please don’t kill me), when you point it to a compromised web page, it “crawls” it and starts fetching and executing any internal or external JavaScript, following redirects and downloading files just like a browser would do. When if encounters some files it cannot analyze by itself (like Flash, Java and PDF), it passes them to some external tools. Thug’s results are then collected in a variety of formats, among which there’s a MongoDB database.

Unfortunately, while it’s extremely good at doing what it does, Thug still has one big problem: you need to be a real h4x0r to be using it. Its results can be quite hard to read and most people don’t even have the skills to understand them, so they only see if they can get the exploit kit’s binary payload (the malware’s *.exe file) and, if they can’t, they move along and say the page is not malicious, or maybe the exploit kit is inactive [2]. That’s where a good web GUI would come handy, and that’s exactly what Thug’s Rumal was born for: there’s plenty of information that could be extracted from Thug’s results and that could point you to the conclusion that the page is actually malicious and you’d better check your Windows PC twice 🙂

Thug’s Rumal

Now, let’s come to Rumal. We have been developing it for a few weeks now, so it’s still a very young project. We sitll haven’t released the code on GitHub because we still have to polish it a bit, but we will do it as soon as possible. EDIT 2015/03/15: We finally released Rumal’s source code on GitHub! Please see below for the link!

Rumal can be divided into two parts: let’s call it the front-end and the back-end.

Front-end

The front-end consists of a Django project providing a web site that lets you submit new URLs (let’s call them “tasks”) and browse the results. Some key points of it are:

  • Data visualization: the main goal of this website is that of displaying Thug’s own results (e.g. the data saved in the MongoDB collections, or in the JSON files if you enable file logging) in a way that is both clear to understand and easy to browse and search. This involves studying basic data visualization problems such as contextual graphs, code highlighting, etc.
  • Metadata from external sources: Rumal should also provide users with a set of complementary tools, such as the ability to search a given domain or IP address in external services such as VirusTotal, DomainTools, UrlQuery and so on. Whenever possible, those lookups may be performed automatically by the application itself, and the search results may be shown along with Thug’s own data (e.g. GeoIP information could be drawn in a map).
  • Correlation: different analyses should be correlated and common patterns should be evidenced. This would involve clustering similar behaviours and recognizing exploit kits, but another key point may be that of identifying their spreading techniques and infection campaigns (e.g. how many analyses landing to the same exploit kit start from the same vulnerable CMS engine?)

Rumal is being designed as a sort of social network, where people can share their results, their comments and even their settings with other users or groups. Some random thoughts:

  • Integration: Right now, we are using Django’s builtin authentication and authorization system, but it would be nice to use OAuth.
  • Groups and circles: Groups are sets of users, just like in a Linux or Windows system; each user knows what groups he’s in, and he also knows who else is in the same group. Circles, as in Google+, are a different concept: they are private to each user, and only the circle’s owner knows who’s inside it. You know you are in somebody else’s circles, but you don’t know what’s the name of the exact circle he/she put you in, and you don’t even know who else is in the same circle. Right now, Rumal only has groups; it would be nice to introduce circles as well.
  • Social features: many other social features might be added, such as karma score, chats or forums, etc.

Back-end

The back-end is a sort of daemon that does the real processing by running one or more Thug instances. There are quite a lot of issues to solve with this, and they mostly involve some of Thug’s internals.

For example, building a single-process, single-thread daemon that runs tasks in sequence is pretty straightforward. But, of course, we want Rumal to be able to process more than one task at once, for example by running different threads or processes and spawning one Thug instance per thread/process. Unfortunately, this is probably going to be a pain due to the fact that one of Thug’s core components is the Logger instance, which is treated as a sort of “global variable” maintained by the interpreter itself. You can access the same Logger instance from any of Thug’s components and, since the Logger is a plain Python object, you can append arbitrary properties and data to it, and retrieve its values from any other component. While this trick is extremely clever and handy, when you have more than one instance of Thug running (even if they run on different Python processes), those instances may be accessing the same Logger object and some race conditions may arise. This problem needs to be investigated and addressed. Possible technologies that may (or may not) help with that: virtualenv, Docker, …

Conclusions

If you feel like getting involved in Rumal’s development, either as a GSoC student or as a contributor, you should make yourself familiar with those concepts:

  • Exploit Kits: what they are, how they work, how to analyze them
  • Thug: use it as much as you can, try to understand its output and study its internals
  • Python, Django, TastyPie, Bootstrap, jQuery: that’s what the Rumal’s woven is composed with

Should you have any questions, please feel free to subscribe to the Honeynet Project’s GSoC mailing list, or ping us via Twitter: @PietroDelsante and @a_de_pasquale.

=== EDIT 2015/03/15 ===

You can find Rumal’s source code on GitHub now!

=== END EDIT ===

Foot Notes

[1] I suggest you this article about exploit kits. Basically, exploit kits are normally used in drive-by download attacks, where a malicious user compromises a huge set of vulnerable sites (see here for an example). When a user loads the compromised page, some JavaScript code redirects them to the EK, that then serves one or more exploits such as malicious Flash, Java or PDF files. The exploit is not the malware by itself: its only purpose is that of downloading the EK’s payload (the real malware) and executing it, thus infecting the victim. Exploit kits can also be used in spam attacks, where users get unsolicited emails containing a link that, if opened, redirects them to the EK’s landing page. Another attack that makes use of EKs is malvertising, where a malicious banner (often written in Flash) is pushed to a legitimate banner circuit that displays it on legitimate web pages, infecting their visitors.

[2] Most EKs only last a few days, sometimes only a few hours, then they move to a new domain and a new set of IP addresses: this way they can avoid being tracked, blacklisted or shut down by security firms and researchers.

This entry was posted in GSoC, Honeynet Project, News and tagged , , . Bookmark the permalink.