Blog | Context Information Security

Most of the back-end software and infrastructure devices, which we tend to never give a second thought about, use Linux or some exotic manifestation of Unix, depending on the specific task of the node in question. Perhaps we do not give these back-end systems too much thought because they are so reliable, and are supposedly more 'failproof' than your common Windows workstation that can surprise you every other week. Or perhaps it is to avoid a headache.

It also seems to be a common belief that Linux systems are relatively secure, as making anything run on Linux as intended is a lot of effort. Even more effort would have to go into making it run malware, or to developing an exploit which would 'just work' on the unique snowflake Unix environment, lovingly configured by a systems administrator.

Nevertheless, it is these little bespoke wonders – routers, switches, servers, load-balancers, firewalls, DNS servers and other less famous characters – which provide for the "interconnectedness of all things" (Douglas Adams) in the background of their more famous front-end counterparts.

It all began with a crash

Unfortunately for everyone who appreciates the numerous benefits of the global network of information and services that we have come to know as the Internet, malicious perpetrators (sometimes still referred to as hackers, because the 90s were not that long ago, no matter what you say) are getting better at what they do in response to technology slowly but surely becoming more secure.

Perhaps one of the first major vulnerabilities that shook the community's belief that Unix is near close to untouchable, was Shellshock – an easy-to-exploit and nearly omnipresent bug in how the Unix Bash shell set and processed environment variables. It allowed attackers to remotely supply their own environment variable values resulting in the target system executing whatever commands they happened to have trailing.

In a way, every vulnerability we find – even if it is 20 years after it was written – is a good thing because we can still learn from it. Most importantly, we can adapt, which is crucial in not letting the attackers have the upper hand. In the case of Shellshock, it certainly got the community thinking and waking up from a false sense of security where it concerns Unix systems. It was inevitable that similar issues would eventually surface.

Once on a long rainy afternoon, a curious Google engineer noticed temperamental behaviour that was unusual even for the SSH service he was debugging. Segmentation faults (segfaults) are not an uncommon problem with low-level machine-to-machine interactions, where high level end-points (including the homo sapiens kind) are involved. However, in years of experience dealing with buffer overflows, reliably occurring segfaults can cause a bit of nervous twitch, especially in the smart guys at Google, who always seem to be paying attention. Another thing about mysterious segfaults – you can bet it is something to do with C.

Turns out it was. Following a long and, as we can only begin to imagine, tedious in-depth investigation of the SSH connection at fault, Google discovered that the ubiquitously used GNU C Library (glibc) was the real culprit in this case.

Ghost of libc past

It is not the first time something wrong has been found with glibc. The GHOST vulnerability from early 2015 made a significant amount of noise in its time. Its difficult and limited exploitability, however, never made it quite as famous as Shellshock. In a way it provided a relative level of reassurance that Linux is still a tough cookie to mess with.

Oddly enough, GHOST was also related to hostname resolution via the gethostbyname() functions in glibc, which in retrospect should have led to further investigation of how the library interacts with DNS. But few versions of glibc were affected, and patching was easy; alternatively, users could just opt for using getaddrinfo() instead, which was deemed safe. As a result, the topic did not stir up too much concern and was forgotten rather quickly. However, sometimes ghosts come back to haunt the conscience of the guilty.

When Google discovered the new issue with glibc and DNS, they also found out that it is not even that new. The bug was already reported in July 2015 (hence the weird and slightly misleading CVE number), and the glibc team quietly accepted their fate and worked away at a fix. In addition a couple of researchers at Red Hat also worked away at the issue and exploit development independently. This most certainly sped up resolving the issue at hand (potentially more effectively) now that there seems to be actual interest in it.

It always looks better when you come out with a hot new vulnerability AND a sturdy fix AND plenty of ways it could have been exploited. Google and Red Hat teams also fixed a couple of other issues in glibc while they were at it, such as enhancing the dynamic shared library loader and local storage and fixing a bug in the POSIX realtime support.

So What?

The significance of this bug lies in it affecting nearly all network devices to an extent, and being deeply involved in a process that keeps the Internet going – Domain Name Resolution is serious business. Everyone must have had that time when their DNS went down, and they had to use IP addresses to get to their favourite social networks (obviously before we learned to use Google's DNS or maybe before it was there for us to use).

Something that endangers the very backbone of the Internet is bound to get a lot of attention. Much for the same reason, despite having a fix, it is (a) not that easy to apply because there is downtime involved and (b) impossible to guarantee effective remediation on 'all' of the involved infrastructure. And unfortunately, Linux will just not work without the C library, because as it turns out, machines really like chatting to fellow machines in C, but we would not really want them speaking French.

Additionally consider the special child that the Internet of Things (IoT) has shown itself to be in the past years – none of those smarts work without DNS. Because these interconnected smart devices need to be very lightweight and dependable, a vast majority of them will run on Linux with glibc. Even bespoke embedded systems may turn out to be cleverly disguised Unix. However, there are rumours that Android is unaffected by this flaw - must be all the Java getting in the way. (In reality, Android just does not use glibc.)
The increasingly popular bitcoin service, no less important than real coin, also uses glibc quite a lot under the hood of it all. Routing money via the Internet may have its dangers, but routing money over a global network vulnerable to remote code execution via domain name routing requests... that just sounds dreadful. Anyway, people are concerned, and for a good reason, it seems.

What is worse, there is extensive research into the possibility that DNS caches could be traversed by a glibc exploit, hence making them susceptible to the same flaw. At the moment it seems like a far stretch and way too much effort. However, the official claim is that this is possible at least in theory. Therefore even hosts behind DNS caches cannot be seen as safe.

The vulnerability affects all versions of glibc since 2.9 (fixed in 2.23). This would span the vast majority of infrastructure services and devices, as well as programming languages and frameworks; PHP, Java, Python, JavaScript and Rails being only some examples. Not surprisingly, even Haskell is not safe. Trust a functional language to have all the reasons in the world to interface via C.

Do not forget to expect the flaw to be present in the unexpected, like the SSH, Sudo (doing things in the name of "root" is not working out well yet again) and Curl utilities. We can only hope that Solitaire does not need to interact with networked hosts via C. But you never 'really' know.

How it works

Whenever a machine needs to talk to another machine with which it is not on first name basis, it will ask the DNS for the machine's IP, aka its A (address) record. This will most commonly happen via C's libresolv library, in which case the vulnerability in question (we hope it gets named soon) can come into play.

If an attacker can man-in-the-middle this interaction or otherwise speak when not spoken to, they can reply to the initiator with an overly long (2048+ bytes) A/AAAA record immediately followed by another response that will overwrite the stack. This is because the vulnerable send_dg() and send_vc() functions allocate a whole new buffer if they receive a response larger than the initially expected 2048 bytes.

Once you have a spare buffer on the target system that you can write to, you can supply your own code for the target to execute. Unfortunately, it is never the good kind of code.

Where with GHOST the issue could be solved by using getaddrinfo() instead of gethostbyname(), here getaddrinfo() is the vulnerable method. (There is certainly a lesson here about quick fixes not lasting very long.) When an A/AAAA lookup is initiated, getaddrinfo() calls send_dg() and send_vc(), which are at the bottom of the issue. All of these handy methods and functions reside in the libnss_dns.so.2 NSS module of glibc.

Thank the massive ethical hacking and security research community that buffer overflows have been studied since prehistoric times and we already have mechanisms, such as ASLR, that make exploitation far less trivial than it could have been. The interconnectedness of all things, however, tends to lead to unexpected circumstances. Most things are fairly well protected against buffer overflows, but do these things maybe at some point use some obscure non-memory-safe library? You can bet on it.

You can also bet on Google. They came up with some clever proof of concepts (PoC) to demonstrate bypassing modern buffer overflow mitigations, such as NX and ASLR (and to show off their 1337 skills, no doubt), but for obvious reasons the weaponised exploits will not be made public at least until there is a high degree of certainty that most systems are protected.

The message is clear, however – your fancy stack canaries and randomised memory addresses will be daunting to attackers, but ultimately will not keep you safe.

Enter bad new buffer overflows.

Recommendations

First of all, spend some quality time pinpointing the external-facing services that are vulnerable to the issue. The currently released PoC code will merely crash the targeted service or device, which may be up there on the inconvenience scale for live infrastructures, but less so than having hackers roam around in your servers – crashing them is the least bad thing they would do.

You could, of course, go ahead and fix everything you have just in case, but that will involve downtime anyway. That is just something to accept.

Use this. From there on it should be easy to tell that your system is vulnerable - it crashed.

It may look something like this:

(gdb) x/i $rip
=> 0x7fe156f0ccce <_nss_dns_gethostbyname4_r+398>: req
(gdb) x/a $rsp
0x7fff56fd8a48: 0x4242424242424242 0x4242424242420042

The best option from here is to upgrade to the fixed version of glibc (2.23). Red Hat customers are in luck, because they also get very helpful instructions on how to achieve this.

If restarting to update all packages really is a problem, you can try listing all running processes that use an old version of glibc. Use the following command:

lsof +c0 -d DEL | awk 'NR==1 || /libc-/ {print $2,$1,$4,$NF}' | column -t

You should then be able to restart these separately, without disrupting the rest of the system.

If you cannot fix it, because it is too much work, and/or you cannot afford the downtime, you can still fortify and hope for the best. Or reconfigure the local DNS for external-facing services to limit the accepted response sizes. The heroes of this story, Google and Red Hat, recommend tools such as DNSMasq.

Given how long it has been from discovering the vulnerability to supplying a fix – the vulnerability has been laying low since last July – it is not unreasonable to suspect the flaw has already been exploited by non-well-meaning individuals. Therefore a health check and a run through the logs may be a good idea if your infrastructure handles security sensitive or otherwise profitable content.

'Mitigations' that do not work:

Setting `options single-request` does not change buffer management and does not prevent the exploit.
Setting `options single-request-reopen` does not change buffer management and does not prevent the exploit.
Disabling IPv6 does not disable AAAA queries. The use of AF_UNSPEC unconditionally enables the dual query.
The use of `sysctl -w net.ipv6.conf.all.disable_ipv6=1` will not protect your system from the exploit.
Blocking IPv6 at a local or intermediate resolver does not work to prevent the exploit. The exploit payload can be delivered in A or AAAA results, it is the parallel query that triggers the buffer management flaw.

(Carlos O'Donell, Red Hat)

Remember - by going through the trouble of securing your systems, you are helping save the Internet (and maybe even make Google reveal their amazing weaponised exploits).

Happy fixing!

Contact and Follow-Up

Alise is part of our Assurance team in our London office. See the Contact page for how to get in touch. She's an IT professional (with a talent to break things and then have to fix them), security enthusiast, sometimes a serious person.

References

Other Comments on the Issue:

The New glibc Vulnerability that Desperately Needs a Name

By Alise Silde

It all began with a crash

Ghost of libc past

So What?

How it works

Recommendations

Contact and Follow-Up

References

About Alise Silde

The New glibc Vulnerability that Desperately Needs a Name

By Alise Silde

It all began with a crash

Ghost of libc past

So What?

How it works

Recommendations

Contact and Follow-Up

References

Print Article

About Alise Silde