Polip and entry point obfuscation

A while ago, on a visit to an Anti-Virus lab, we started playing with some Polip samples. One of the analysts mentioned how tedious was in some cases to find the obfuscated entry-point in files infected with Polip.

We looked into a few samples of the malware, observing how the transfer of control to Polip code happens. As we read through the code, I started seeing some patterns.
In the cases we looked at, Polip always added a new section to the end of the infected executable. Then it chose some call within the original application and modified it to jump to the virus code, later on resuming execution in the original target of the call. Keeping, in that way, the original functionality of the infected application, just with a small detour.

Needless to say, it would be very tedious to manually find the redirected call(s) within an infected executable, but really easy with some scripting. Using IDAPython and the knowledge about how the jumps to the malicious code look like, we can easily come up with something that will quickly list them for us.

Let's take a look at the structure of the executable. The code of the standard application will, in most cases, reside in a single segment named CODE, .text or something along the lines. The code of the virus will reside in a segment appended to the end of the file.



Hence, if we could make a simple script that checks every single code reference that crosses segment boundaries, we would be able to list the transfers of control to the virus. Some other references might come up depending on the executable, but with some additional filtering, we will get just a few, with Polip's entry point(s) among them.

The idea in pseudo-code could look something like the following:


for each segment in the executable:
  for each function if the segment:
    for each instruction in the function:
      for each code reference from the instruction:
        if the reference points to another segment and both source and target segments are marked executable then
          print 'Possible obfuscated entry point found'



And in IDAPython... well, not all that different:


for segment in idautils.Segments():
  for func_start in idautils.Functions(segment, idc.SegEnd(segment)) :
    for head in Heads(func_start, FindFuncEnd(func_start)):
      for ref in list( CodeRefsFrom(head, 0) ):
        if SegName(ref) != SegName(func_start) and GetSegmentAttr( ref, SEGATTR_PERM ) & 0x1 and GetSegmentAttr( func_start, SEGATTR_PERM ) & 0x1:
          print '%08x: intersegment reference to %08x' % (head, ref)



Polip also finds cavities within the "standard" text section and places chunks of itself there. For those cases this simple idea of looking for inter-segment code references won't yield anything. Fortunately, most of the code lies in the extra section and studying the references from that code is almost trivial to find the chunks that Polip inserted in the cavities... it's just a few more lines of IDAPython left as an exercise for the reader... ;-)

Just want to remind anyone interested that BlackHat Vegas is coming in a few weeks and Pedram Amini and I will be teaching our training, "Reverse Engineering on Windows: Application in Malicious Code Analysis " . If you want learn about how to build this kind of automation among other things, we would love to have you in our class.

Polip and entry point obfuscation

A while ago, on a visit to an Anti-Virus lab, we started playing with some Polip samples. One of the analysts mentioned how tedious was in some cases to find the obfuscated entry-point in files infected with Polip.

We looked into a few samples of the malware, observing how the transfer of control to Polip code happens. As we read through the code, I started seeing some patterns.
In the cases we looked at, Polip always added a new section to the end of the infected executable. Then it chose some call within the original application and modified it to jump to the virus code, later on resuming execution in the original target of the call. Keeping, in that way, the original functionality of the infected application, just with a small detour.

Needless to say, it would be very tedious to manually find the redirected call(s) within an infected executable, but really easy with some scripting. Using IDAPython and the knowledge about how the jumps to the malicious code look like, we can easily come up with something that will quickly list them for us.

Let's take a look at the structure of the executable. The code of the standard application will, in most cases, reside in a single segment named CODE, .text or something along the lines. The code of the virus will reside in a segment appended to the end of the file.



Hence, if we could make a simple script that checks every single code reference that crosses segment boundaries, we would be able to list the transfers of control to the virus. Some other references might come up depending on the executable, but with some additional filtering, we will get just a few, with Polip's entry point(s) among them.

The idea in pseudo-code could look something like the following:


for each segment in the executable:
  for each function if the segment:
    for each instruction in the function:
      for each code reference from the instruction:
        if the reference points to another segment and both source and target segments are marked executable then
          print 'Possible obfuscated entry point found'



And in IDAPython... well, not all that different:


for segment in idautils.Segments():
  for func_start in idautils.Functions(segment, idc.SegEnd(segment)) :
    for head in Heads(func_start, FindFuncEnd(func_start)):
      for ref in list( CodeRefsFrom(head, 0) ):
        if SegName(ref) != SegName(func_start) and GetSegmentAttr( ref, SEGATTR_PERM ) & 0x1 and GetSegmentAttr( func_start, SEGATTR_PERM ) & 0x1:
          print '%08x: intersegment reference to %08x' % (head, ref)



Polip also finds cavities within the "standard" text section and places chunks of itself there. For those cases this simple idea of looking for inter-segment code references won't yield anything. Fortunately, most of the code lies in the extra section and studying the references from that code is almost trivial to find the chunks that Polip inserted in the cavities... it's just a few more lines of IDAPython left as an exercise for the reader... ;-)

Just want to remind anyone interested that BlackHat Vegas is coming in a few weeks and Pedram Amini and I will be teaching our training, "Reverse Engineering on Windows: Application in Malicious Code Analysis " . If you want learn about how to build this kind of automation among other things, we would love to have you in our class.

badass debugger + badass toy = geek pr0n

Today I finally got working a hacked-together minimal version of the iPhone debugger client for BinNavi. It's heavily based on Patrick Walton's (with HD's updates) weasel debugger. Once tied to BinNavi debug client framework the whole client-server interaction is trivial.

It feels just right, the best looking debugger together with the slickest device.. recipe for fun.. ;-)



The test application is telnet on the iPhone. On the iPhone's screen is the debug output from BinNavi's debug client. telnet is launched from an ssh session in OSX, where BinNavi is running.



For anybody trying to link Mach's debugging interface with a C++ iPhone application, remember the extern "C" when defining boolean_t exc_server(mach_msg_header_t *in, mach_msg_header_t *out); (which is not defined in the header files, as pointed in weasel's source code). Otherwise you'll get a nasty "Undefined symbols" message when linking.

extern "C" is also needed for catch_exception_raise(...) so exc_server can call it to handle exceptions. Documented here.
(I've used the standard iPhone toolchain on Debian, this is running on the firmware 1.1.3)

badass debugger + badass toy = geek pr0n

Today I finally got working a hacked-together minimal version of the iPhone debugger client for BinNavi. It's heavily based on Patrick Walton's (with HD's updates) weasel debugger. Once tied to BinNavi debug client framework the whole client-server interaction is trivial.

It feels just right, the best looking debugger together with the slickest device.. recipe for fun.. ;-)



The test application is telnet on the iPhone. On the iPhone's screen is the debug output from BinNavi's debug client. telnet is launched from an ssh session in OSX, where BinNavi is running.



For anybody trying to link Mach's debugging interface with a C++ iPhone application, remember the extern "C" when defining boolean_t exc_server(mach_msg_header_t *in, mach_msg_header_t *out); (which is not defined in the header files, as pointed in weasel's source code). Otherwise you'll get a nasty "Undefined symbols" message when linking.

extern "C" is also needed for catch_exception_raise(...) so exc_server can call it to handle exceptions. Documented here.
(I've used the standard iPhone toolchain on Debian, this is running on the firmware 1.1.3)

Take Two: Packers, Time and Google Groups

November 29, 2007 by · Leave a Comment
Filed under: programming, security, tools, visualization 
I just had to do it... This morning I read about chronoscope in a post in the Google Code Blog and I could not help myself from wanting to tinker with it.

I wrote a Mathematica function to export a time-series of the format (timestamp, value) into the dataset format used by chronoscope.


Epoch[date_] :=
  ToString[AbsoluteTime[DateList[ToString[date]]] -
  AbsoluteTime[DateList["1970"]]];

ChronoscopeJsExport = Function[ {datasetName, id, label, axis, data},
  jsData = datasetName <>
  " = {\nId: \"" <> ToString[id] <> "\", \n" <> "domain: [" <>
  StringJoin[ Riffle[ Map[ Epoch, data[[All, 1]] ], ", "] ] <>
  "], \n" <> "range: [" <>
  StringJoin[
    Riffle[ Map[ ToString, data[[All, 2]] ], ", "] ] <> "], \n" <>
  "label: \"" <> ToString[label] <> "\", \n" <>
  "axis: \"" <> ToString[axis] <> "\"\n};";
  jsData
];



And ran it through the packer time-series I harvested from Google Groups. Then I picked some widget demo code and put it all together in a mash-up. The results of the quick hack are here... much nicer to visualize than in the previous post. (and it's interactive!)
  • Use the mouse-wheel to zoom
  • Drag the plot left/right to browse around different date ranges
  • You can pick any packer and the data will be plotted against the previously selected one



Take Two: Packers, Time and Google Groups

November 29, 2007 by · Leave a Comment
Filed under: programming, security, tools, visualization 
I just had to do it... This morning I read about chronoscope in a post in the Google Code Blog and I could not help myself from wanting to tinker with it.

I wrote a Mathematica function to export a time-series of the format (timestamp, value) into the dataset format used by chronoscope.


Epoch[date_] :=
  ToString[AbsoluteTime[DateList[ToString[date]]] -
  AbsoluteTime[DateList["1970"]]];

ChronoscopeJsExport = Function[ {datasetName, id, label, axis, data},
  jsData = datasetName <>
  " = {\nId: \"" <> ToString[id] <> "\", \n" <> "domain: [" <>
  StringJoin[ Riffle[ Map[ Epoch, data[[All, 1]] ], ", "] ] <>
  "], \n" <> "range: [" <>
  StringJoin[
    Riffle[ Map[ ToString, data[[All, 2]] ], ", "] ] <> "], \n" <>
  "label: \"" <> ToString[label] <> "\", \n" <>
  "axis: \"" <> ToString[axis] <> "\"\n};";
  jsData
];



And ran it through the packer time-series I harvested from Google Groups. Then I picked some widget demo code and put it all together in a mash-up. The results of the quick hack are here... much nicer to visualize than in the previous post. (and it's interactive!)
  • Use the mouse-wheel to zoom
  • Drag the plot left/right to browse around different date ranges
  • You can pick any packer and the data will be plotted against the previously selected one



pefile 1.2.8

November 25, 2007 by · Leave a Comment
Filed under: pefile, programming, reverse engineering, security, tools 
And yet another one. pefile 1.2.8 comes with the usual few bugfixes and a slew of enhancements. Some of them are:
  • One can now "relocate" the image by invoking relocate_image(ImageBase) with a new ImageBase the PE file's relocations will be applied to produce the relocated image.

  • Section entropy is computed faster (thanks to Gergely)

  • MD5, SHA-1, SHA-256, SHA-512 hashes are calculated on a per-section basis (thanks Jim Clausing for the suggestion)

  • Improved (rather fixed) handling of Unicode strings when parsing the resources information

For more details and downloads head to pefile's project page.

pefile 1.2.8

November 25, 2007 by · Leave a Comment
Filed under: pefile, programming, reverse engineering, security, tools 
And yet another one. pefile 1.2.8 comes with the usual few bugfixes and a slew of enhancements. Some of them are:
  • One can now "relocate" the image by invoking relocate_image(ImageBase) with a new ImageBase the PE file's relocations will be applied to produce the relocated image.

  • Section entropy is computed faster (thanks to Gergely)

  • MD5, SHA-1, SHA-256, SHA-512 hashes are calculated on a per-section basis (thanks Jim Clausing for the suggestion)

  • Improved (rather fixed) handling of Unicode strings when parsing the resources information

For more details and downloads head to pefile's project page.

Hex-Rays unleashed

September 18, 2007 by · Leave a Comment
Filed under: reverse engineering, security, tools 
Hex-Rays, Ilfak Guilfanov's decompiler, has been unleashed. I have had the chance of playing a bit with the beta and it is really impressive, to say the least. This will save so many hours to reverse engineers...

Hex-Rays unleashed

September 18, 2007 by · Leave a Comment
Filed under: reverse engineering, security, tools 
Hex-Rays, Ilfak Guilfanov's decompiler, has been unleashed. I have had the chance of playing a bit with the beta and it is really impressive, to say the least. This will save so many hours to reverse engineers...

Next Page »