Monday, October 2, 2017

Subclassing HTMLParser Class in Python 2

Using HTMLParser class (https://docs.python.org/2/library/htmlparser.html) in Python 2 is rather easy if you don't need to pass parameter to your subclass for custom processing of the HTML tags. But, what if you do? This is rather trivial to do in Python 3, as seen here. The problem with Python 2, if you follow the "normal" way of invoking the parent HTMLParser class as explained at https://stackoverflow.com/questions/2399307/how-to-invoke-the-super-constructor , you would encounter error like this: TypeError: super() argument 1 must be type, not classobj.

Now, how to fix that error? The error culprit is explained at: https://stackoverflow.com/questions/1713038/super-fails-with-error-typeerror-argument-1-must-be-type-not-classobj#1713052. However, it doesn't give us satisfactory fix for the error because you would need to mess with HTMLParser class for that to work. I prefer not to do it. This is where Python's type keyword comes to the rescue. The code below shows how to properly subclass HTMLParser in Python 2, it might not be pretty a.k.a it's a rather quick-hack, but it works.
from HTMLParser import HTMLParser
from htmlentitydefs import name2codepoint

class ImgHtmlParser(HTMLParser):
    def __init__(self, path):
        super(type (self), self).__init__()
        self.reset()
        self.fed = []
        self.download_path = path
        print "ImgHtmlParser constructor"

    def handle_starttag(self, tag, attrs):
        if tag == 'img':
            print "Start tag:", tag
            for attr in attrs:
                print "     attr:", attr
                if attr[0] == "data-fullres-src":
                    print "image URL: " + attr[1]
                    print "Download Path = " + self.download_path 

I used the type keyword in place of the derived class literal name. It's not foolproof though if ImgHtmlParser class has a child class, but in this case, it doesn't have one. So, we're OK.

Monday, May 15, 2017

Checking whether you have MS17-010 Windows Update Installed (a.k.a Guarding Against WannaCry)

Kaspersky Lab GReaT team explains about protecting yourself from WannaCrypt/WannaCry infection over at: https://blog.kaspersky.com/wannacry-ransomware/16518/. The article specifically mentioned:
"Install software updates. This case earnestly calls for installing the system security update MS17-010 for all Windows users, especially when Microsoft even released it for systems that are not officially supported anymore, such as Windows XP or Windows 2003. Seriously, install it right now. Now is exactly the time when it's really important."
The explanation above specifically mentioned about MS17-010 Windows system security update, explained at: https://support.microsoft.com/en-us/help/4013389/title (the vulnerability is explained at https://technet.microsoft.com/library/security/MS17-010). What is not very clear is how do you check whether the update is already installed on your Windows machine or not. The steps are easy for Windows power users but not trivial at all for those not familiar with Windows Update mechanism.

I'll show you how to do this on Windows 10 version 1607. You can carry out similar steps for other Windows version.

  1. First step, is locate the KB (Knowledge Base) number of the specific security update. So, we look for MS17-010 security update explanation. It's at: https://support.microsoft.com/en-us/help/4013389/title. Scroll down to your specific Windows version, for Windows 10 build 1607, we found the KB number from the update file name: Windows10.0-KB4013429-x64.msu. The filename indicates the KB number to be: KB4013429.
  2. Search for the specific Windows KB "support article". In this case, just search for "KB4013429" (without the quotes) in a search engine. We found this at: https://support.microsoft.com/en-us/help/4013429/windows-10-update-kb4013429. What is important to look at is the hotfix number of updates superseding our target update, because if either of the hotfix is installed, we're basically good, i.e. we have MS17-010 fixes installed. Like so: 
    Superseding Windows Hotfix numbers (circled in RED)
  3. Now, we know what  Windows update hotfix versions we need to check for. For our Windows 10 version 1607, the superseding hot fixes are: KB4019472 (OS Build 14393.1198), KB4015217 (OS Build 14393.1066 and 14393.1083), KB4016635 (OS Build 14393.970), KB4015438 (OS Build 14393.969). Therefore, if any one of them are installed. We're good.
  4. Check for installed updates in the Windows machine. We can use systeminfo command line utility for that. For example: 
    C:\systeminfo > c:\Users\blah\Desktop\updates.txt
    
    Open updates.txt to see the installed Hotfix versions. This is a sample output in updates.txt
    Installed hotfixes in Windows 10 Machine

At this point, we can be sure whether MS17-010 or its equivalent is installed. Hopefully, this helps those wanting to know whether their system has MS17-010 update installed or not.

Cheers.

Tuesday, May 2, 2017

"Signal" Handling in Windows Console Application

Signal handling in Windows console application is quite different from what POSIX defines. Well, you could do it the POSIX way if you're using Visual Studio (see: signal). But, the behavior is not quite like POSIX in all circumstances. The Windows native "signal" handling is the way to go if you're using third party compiler suite or cross-compiling via MinGW-W64. The native "signal" handling is also known as Windows Console Control Handlers. The Console Control Handlers are "reachable" via native Windows API.

There is a peculiarity in Windows Console Control Handler compared to the way POSIX handle CTRL+C (SIGINT) signal. In Windows, a new thread is created by Windows which invoke the registered control handler to process the signal, see: CTRL+C and CTRL+BREAK Signals. Contrary, in POSIX, the OS doesn't run the signal handler in a new thread.

Now, let's look at how you would implement a native Windows signal handler for console application. The Windows API that you need is: SetConsoleCtrlHandler(). As for, how to use the function, MSDN has it covered: Registering a Control Handler Function. FYI, I have tested part of the routine with Mingw-w64 cross compiler suite and run the executable in Windows 10. I confirmed that it works as "advertised".

Tuesday, March 21, 2017

Path with Backslash in C++11 Regex

String with backslash in C/C++ is sometimes problematic due to backslash being an escape character in C/C++ character string. It even got more complicated as we use the string in regular expression (regex) because backslash is also an escape character in regex. Therefore, the number of backslashes that you need grows exponentially if you intend to feed literal backslash character into the regex engine. Let's see a working sample code.
void regex_test()
{
    cout << "Executing " << __func__ << endl;

    std::string s ("This machine has c:\\ ,D:\\, E:\\ and z:\\ drives");
    std::smatch m;

    /**
     * We have to use \\\\ so that we get \\ which means an escaped backslash.
     *
     * It's because there are two representations. In the string representation
     * of the regex, we have "\\\\", Which is what gets sent to the parser.
     * The parser will see \\ which it interprets as a valid escaped-backslash
     * (which matches a single backslash).
     */
    std::regex e ("[a-zA-Z]:\\\\");   // matches drive path

    std::cout << "Target sequence: " << s << std::endl;
    std::cout << "Regular expression: /[a-zA-Z]:\\\\\\\\/" << std::endl;
    std::cout << "The following matches and submatches were found:" << std::endl;

    while (std::regex_search (s,m,e)) {
        for (auto x:m) std::cout << x << " ";
        std::cout << std::endl;
        s = m.suffix().str();
    }
}

The code above shows that you need four backslashes to feed one literal backslash into the regex engine. Why is that? because you need four backslashes to produce two escaped backslashes in the regex string. The other two backslashes act as escape characters in the C/C++ compiler that you use.
Therefore, the "produced" two backslashes then act as a single escaped backslash for the regex engine which parses the input string.
Anyway, to give you an idea, this is the output of the function above in Linux:
Executing regex_test
Target sequence: This machine has c:\ ,D:\, E:\ and z:\ drives
Regular expression: /[a-zA-Z]:\\\\/
The following matches and submatches were found:
c:\ 
D:\ 
E:\ 
z:\ 
I hope this helps poor souls out there working with regex in C/C++.

Tuesday, March 14, 2017

Free "Remote Desktop" Setup for Windows Home Editions

As you might already know, all Windows Home Edition variants don't support Microsoft Remote Desktop Protocol (RDP) https://en.wikipedia.org/wiki/Remote_Desktop_Protocol. Therefore, you need different software stack as a solution to the problem.

Enter VNC (https://en.wikipedia.org/wiki/Virtual_Network_Computing). VNC is another protocol to remotely control Windows desktop. VNC can be used as RDP replacement, especially, if you want to control Windows Home desktop from Linux. This is a verified setup that I have tested:

  • The VNC client, i.e. the machine that will be used to control the Windows Home desktop, runs Linux. In my case it's a XFCE4 window manager. I'm using Vinagre (https://wiki.gnome.org/Apps/Vinagre/) as the client application to access the remote VNC server. 
  • The VNC server, i.e. the machine that runs Windows Home OS, is running TigerVNC(http://tigervnc.org) as the application that implements VNC server functionality. TigerVNC is very well maintained as you can see in their github page: https://github.com/TigerVNC/tigervnc.
Hopefully, this is helpful for those thinking about remotely controlling a Windows Home desktop machine.

Monday, February 27, 2017

Parallel Build in Linux/Unix

As a software developer, lengthy build time is always an enemy. You want to do almost anything to shorten the build time. One of the way to do that is to make the build process runs in parallel. If you are using GNU Make, it's as easy as adding "-j" flag to your build script. This is a sample bash script to do that:
#!/bin/bash

_architectures="x86_64-w64-mingw32 i686-w64-mingw32"
CPU_CORES="$(nproc)"

build_exe () {
 local arch=$1
 local core_count=$2

 pushd build-$arch
 CMAKE_INCLUDE_PATH="/usr/"$arch"/include"
 echo "CMAKE_INCLUDE_PATH = "${CMAKE_INCLUDE_PATH}
 export CMAKE_INCLUDE_PATH
   $arch-cmake -G"Unix Makefiles" -DCMAKE_BUILD_TYPE=Debug ..
   ##$arch-cmake -DCMAKE_BUILD_TYPE=Release ..
 make VERBOSE=1 -j$core_count
 popd
}

for _arch in ${_architectures}; do
 case "$1" in 

  clean) if [ ! -d build-${_arch} ]; then
             echo "build directory does not exist. Terminating script.."
         else 
             rm -rvf build-${_arch}
         fi
   ;;
 
  rebuild) if [ -d build-${_arch} ]; then
              rm -rvf build-${_arch}
           fi

      mkdir -p -v build-${_arch}

      ## call build function 
      build_exe ${_arch} ${CPU_CORES}
   ;;
 
  *) if [ ! -d build-${_arch} ]; then
     echo "build directory does not exist. Creating directory.."
     mkdir -p -v build-${_arch}
     fi
   
     ## call build function 
     build_exe ${_arch} ${CPU_CORES}
   ;;
 esac

done 
The preceding script is probably rather intimidating. However, it's just a simple bash script. Just focus to the build_exe() function. That's where the core of the action happens: make is invoked with -j parameter, followed by the number of CPU cores in the system. FYI, the script above is a cross-compilation script which runs on Linux and creates Windows executables. But, the latter fact shouldn't deter you from trying to understand the big picture though ;)

The script shows how to obtain the number of CPU cores in bash, i.e. via the nproc command. nproc is part of coreutils in Linux. If you're using other kind of Unix, try to find an equivalent command. Once, the number of CPU cores is known, that number is used as parameter to make. Running make in parallel should cut down the build time quite a bit. In some projects, it could be quite a lot of saving in build time.

Not all projects can benefit from parallel build. However, it's worth it to try modifying your project to use parallel build before discounting it as useless ;-)

Monday, February 13, 2017

Fix for systemd v232 build failure when using GNU gperf 3.1

You might encounter the build failure in this post title if you're the kind that roll your own Linux Systemd. I encountered it while building Systemd package for my Arch Linux SELinux variant.

The culprit is mismatch in lookup functions declaration--hash functions--generated by GNU gperf version 3.1 and the function declaration in Systemd version 232. I managed to complete the build after creating and using this patch:https://github.com/pinczakko/systemd-gperf-3.1-patch. As for whether the patch is working or not, well, it works without problems in my machine. Nonetheless, it's just a very minor patch.

UPDATE:
------------
This issue has been fixed just now in systemd v232. See: https://github.com/systemd/systemd/commit/c9f7b4d356a453a01aa77a6bb74ca7ef49732c08

UPDATE 2:
---------------
You can add the change as cherry-picked git change to the PKGBUILD to fix this issue in Arch Linux SELinux package. This is the diff (or patch):
diff --git a/PKGBUILD b/PKGBUILD
index 47d82d1..1e57ec7 100644
--- a/PKGBUILD
+++ b/PKGBUILD
@@ -61,6 +61,7 @@ _backports=(
   'cfed63f60dd7412c199652825ed172c319b02b3c'  # nspawn: fix exit code for --help and --version (#4609)
   '3099caf2b5bb9498b1d0227c40926435ca81f26f'  # journal: make sure to initially populate the space info cache (#4807)
   '3d4cf7de48a74726694abbaa09f9804b845ff3ba'  # build-sys: check for lz4 in the old and new numbering scheme (#4717)
+  'c9f7b4d356a453a01aa77a6bb74ca7ef49732c08'  # build-sys: add check for gperf lookup function signature (#5055)
 )
  _validate_tag() {
Hopefully, this temporary fix could help before the official fix is included in the main Arch Linux package.

Wednesday, January 18, 2017

64-bit Software Development on IBM AIX

In this post I'll talk about software development on IBM AIX by means of open source software tools in concert with native AIX development tools.

Using GCC as the compiler to compile your application in AIX is just fine. However, GCC's ld (ld-gcc) linker is not suitable to be used as the linker. This is because linking in AIX is rather tricky and apparently only AIX linker (ld-xlc) work reliably. You can read more about this issue at Using the GNU C/C++ compiler on AIX and AIX Linking and Loading Mechanism.

AIX also has its set of binary utilities (binutils) programs. They are basically the analog of GCC binutils. AIX has native ar archiver, native ld linker (this one is the linker from the AIX xlc compiler suite), and dump utility which is analog to objdump in GCC binutils.

Now, let's see what you need to do to create 64-bit applications in AIX by using the GCC compiler and the native binutils.

  • Pass -maix64 parameter to the GCC C compiler to instruct it to emit the correct AIX 64-bit object file that can be handled by the native AIX linker.
  • Pass -b64 parameter to the native linker via GCC. You should use GCC's -Wl switch for that. The overall parameter becomes -Wl, -b64. IBM AIX ld command reference explains the parameter in detail.
  • Pass -X64 parameter to the native ar archiver to build a 64-bit AIX library. IBM AIX ar command reference explains the parameter in detail.
Once you have build the 64-bit executable or library, you may want to examine it. In Linux or BSD Unix, you would use objdump for that. In AIX, you can use the native dump utility. You need to pass -X64 parameter to dump to instruct it to work in 64-bit mode, i.e. treat the input executable/library as 64-bit binary. For example, the command to show the dependencies of a 64-bit AIX application is: dump -X64 -H.  Refer to the IBM AIX dump command reference for more details.

Listening to Multicast Netlink Socket

Netlink is the recommended way to communicate with Linux kernel from user-space application. In many cases, the communication is unicast, i.e. only one user-space application uses the netlink socket to communicate with a kernel subsystem that provides the netlink interface. But, what if the kernel subsystem provides a multicast netlink socket and you want to listen to the multicast kernel "message"s through netlink? Well, you could do that by bind()-ing to the multicast address provided by the kernel. I'm not going to provide a complete sample code. Just the most important code snippets.

First, you should head over to this netlink discussion to get a sense of the overall netlink architecture.

Once you grasped the netlink architecture, you may follow these steps/algorithm to "listen" to the multicast address(es) provided by the kernel subsystem through netlink:
  1. Init a netlink socket to the kernel subsystem you wish to access. Remember to use #include<linux/[subsystem_header].h>.
  2. Carry-out initialization on the socket if needed.
  3. Bind the socket to the multicast address provided by the kernel subsystem. The multicast address is basically a combination of the following: 
    • The netlink address family, i.e. AF_NETLINK.
    • The netlink multicast group which you can find in the kernel header. For example: the multicast group address for audit subsystem (a constant), is in the audit header file, i.e. <linux/audit.h>
    • Your application's Process ID (PID). 
  4. Read from the socket, when there is data coming in. You might want to use event-based library here, such libev or libevent. In many cases, the kernel only provides a multicast "Read-Only" channel, i.e. you can only read from it. It's not meant to be used to "write" to the kernel.
Step 3 above is probably rather vague. The code below clarify the multicast address that I talked about in that step. Look at the s_addr variable in the code below, it is the multicast address used by bind() to listen to kernel messages. The PID is included in the multicast address because the kernel need to know to which process the message should be sent.
 // ..  
 struct sockaddr_nl s_addr;
 memset(&s_addr, 0, sizeof(s_addr));
 s_addr.nl_family = AF_NETLINK;
 s_addr.nl_pad = 0;
 s_addr.nl_pid = getpid();
 s_addr.nl_groups = AUDIT_NLGRP_READLOG;

 retval = bind(fd, (struct sockaddr *)&s_addr, sizeof(s_addr));
 if (retval != 0) {
  PRINT_ERR_MSG("Failed binding to kernel multicast address");
  return -1;
 }
 // ..
Anyway, because the channel used by the code is multicast channel, multiple user-space application can "listen" to the same kernel subsystem simultaneously. The scenario explained here is not the norm. But, some use cases required this approach.

Monday, January 9, 2017

The Importance of C/C++ Program Exit Status in Unix/Linux

The return value from main() in C/C++ programs a.k.a exit status is often overlooked by less advanced Unix/Linux programmers. Nevertheless, it's important to keep in mind the exit status of your C/C++ code because it will help in the long run. There are at least 2 scenarios where exit status is important:

  1. When you're using shell script to automate processing by using several programs to perform "sub-tasks". In this case, the shell script--very possibly--need to make logical decision based on your program exit status. 
  2. When your C/C++ program is part of a multiprocess program in which your C/C++ program is called/executed (a.k.a fork-ed and exec-ed) by the parent process. In many cases, the parent process need to know whether your program executes successfully or not.

Now, let's be more concrete. Let's say you anticipated that your C/C++ program will be invoked by bash-compatible shell. In that case, your code's exit status must make sense to bash. Therefore, you should:


Following the rules above doesn't necessarily mean your C/C++ program will be bug free because some of the exit status are ambiguous. Nevertheless, it should make it more palatable to be combined into larger system and ease debugging.

As closing, let's look at a very simple code that uses sysexits.h below.
#include <stdio.h>
#include <sysexits.h>
/**
 * Usage: 
 *  test_code  -x param1 -z param2
 */
int main(int argc, char *argv[])
{

 if (argc != 5) {
  printf("Usage: %s -x param1 -z param2\n", argv[0]);
  return EX_USAGE;
 }


      //... irrelevant code omitted

       return EX_OK;
}