Introducing gnirt

In today’s rapidly evolving landscape of enterprise-grade solutions, I’m proud to announce gnirt: a groundbreaking, cloud-native, blockchain-enabled string analysis platform that leverages cutting-edge machine learning algorithms and quantum-resistant cryptographic protocols to deliver unparalleled performance in extracting human-readable sequences from binary data streams.

Just kidding!

Lately, I tried to allocate a bit of my free time to develop personal projects, focused on exploring in depth what interests me. The initial project of this kind that I decided to write is gnirt, which is a clone of GNU Strings binary utility written in C. But there’s a catch: it doesn’t use any #include directive!.

GNU Strings

The standard strings command is really simple: it finds human readable text inside binary files. It’s trivial, but it’s really useful for reverse engineering, forensics, or just figuring out what’s in a random binary file. I decided to reimplement this utility because when I play reverse engineering CTF challenges, strings is usually the first tool I use, just after file and before strace and ltrace (Hints about my next project? who knows👀).

Here’s how it basically works:

open the file 
read the file
for each byte in file:
    if byte is printable:
        add to current string
    else:
        if current string is long enough:
            print it
        reset current string

That’s basically it. Pretty basic shi.

The beginning

First of all, I started with the obvious approach. I implemented the algorithm described earlier like any other sane person. That worked quite well, but it’s not that difficult and GNU strings did it better anyway.

Then I looked at the source code, and I noticed a lot of includes directives for a program this simple.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>

int main(int argc, char **argv) {

Who uses hashtags in 2025?? let’s remove them!

Goodbye, formatted print

The first casualty was printf(), from stdio.h. Instead of

printf("Usage: gnirt <path> [min_length]\n");

I wrote my print. Yes, it’s not formatted and prints only a string, but that’s all I needed, really.

int length(char* buffer){
    int i=0;
    while(buffer[i] != '\0'){
        i++;
    }
    return i;
}

int print(char* buffer){
    write(1, buffer, length(buffer));
    write(1, "\n", 1);
    return 0;
}

The implementation is quite simple: I used a custom built function to calculate the length of the string by iterating each character until the null terminator and increasing a counter by each character. When we have the length of the string, we can use the write() function from unistd.h to write our string directly to STDOUT (File descriptor number 1). I called write() another time to print the newline by default, like it happens with python’s print function.

memset() and isprint()

Next, I wrote clean_buffer, which does the same thing that string.h’s memset does: zero every cell of an array. This was needed to reset the printable string buffer every time I found a non-printable character and needed a clean buffer to start again.

Cleaning a buffer is just a for loop:

int clean_buffer(char* buffer, int length){
    for(int i=0; i<length; i++){
        buffer[i] = 0;
    }
    return 0;
}

This made me able to remove string.h. After that I wanted to remove ctype, that was used for the isprint function, the core of the strings utility: a function that tests if the provided character is printable or not.

For implementing this function it’s enough to take a look at an ascii table.

A nice life hack to know if you find yourself alone on mars is that you can use man command to display an ascii table in both octal, hex and decimal.

ASCII printable characters

As you can see, the printable ASCII range goes from 32 (space) to 126 (tilde) and 9 (tab). All my implementation does is casting that character to an integer, and check if it’s inside this printable ascii range.

int is_printable(char character){
    if(((int)character >= 32 && (int)character <= 126) || (int)character == 9){
        return 1;
    }else{
        return 0;
    }
}

We can remove ctype.h!

The assassination of stdlib.h

Converting strings to integers without atoi() was where things got a bit more interesting.

That was needed for parsing the optional minimum printable string length parameter: if you run gnirt file.bin 8, that 8 is stored as a string and it’s accessible via the argv parameter of the main function.

int atoi(char *input){
    int iterator = 0;
    int number = 0;
    int digit = 0;
    while(input[iterator]!='\0'){
        digit = input[iterator] - 48;
        number = (number*10)+digit;
        iterator++;
    }
    return number;
}

I needed, first of all, to iterate the array of chars indefinitely until i found a null terminator.

From each character, I need to subtract 48 (the ascii code of 0) to get the real digit as a integer, multiply the previous number *10 and then add the last digit as units. Quite simple in retrospect, but it’s been interesting to find out how atoi() works under the hood. Good bye, stdlib.

Changing how the file is read

This is probably the second most interesting way I found to eliminate an include. Originally, I was using stat() to get the file size and then I allocated a buffer of that size to welcome the content of the input file in memory.

stat() was another dependency and I wanted it gone, but having the size of the file was a requirement: this forced me to rethink a bit the algorithm.

Instead of allocating a buffer, copying the file from the disk to the buffer, and then reading the buffer byte by byte, I shifted to a “on the fly” approach: I read the file in chunks and find printable chars in each chunk before skipping to the following chunks, all while maintaining the state between each “shift”. Here’s the relevant code:

    int read_bytes;
    int string_length;
    int chunk_size = 4096;
    char buf[chunk_size];

    // on-the-fly file read and string recognition loop
    string_length = 0;
    read_bytes = read(file_descriptor, buf, chunk_size);
    while (read_bytes > 0){
        for(int i=0; i<read_bytes; i++){
            if (is_printable(buf[i])){
                // add the char to the printable string buffer
            }else{
                if (string_length > 0){
                    // print the string and clean the buffer
                }
            }
        }
        read_bytes = read(file_descriptor, buf, chunk_size);
    }

This has been really interesting because I didn’t just removed a function and reimplemented it myself, but I changed the way the algorithm behaves in order to avoid entirely the need of the function.

The algorithm is everything except complex, but I found this approach I came up with quite fascinating. And this enabled me to remove the line

#include <sys/stat.h>

Deeper into the apple…

At this point I just had 2 include directives:

#include <fcntl.h>      // for open()
#include <unistd.h>     // for read() and write()

I needed open to, well, open the file, read to… read the file and write to print to STDOUT, as explained earlier. The problem is that these are all thin wrapper to syscalls.

At this point, why not calling the syscalls directly? The problem is that for using the syscall c function you need 2 includes, unistd.h and sys/syscall.h. ABSOLUTELY UNACCEPTABLE.

Do you know what method doesn’t need any include and makes you able to execute indirect system calls? INLINE ASSEMBLY!.

I never used inline assembly instructions in a C program, and it’s generally discouraged to do that because in the majority of cases the compiler knows what it needs to do better than you. But this whole mini project is just an excuse to learn something new, so who cares!

Something really interesting is that assembly is obviously platform specific and I was writing on my new Apple Mac Mini M4, so I decided to start with the implementation of a function that calls syscalls via inline assembly on that architecture. Here’s the code:

int macos_arm64_syscall(long number, long arg1, long arg2, long arg3){
            int result;
            
            asm volatile (
                "mov x16, %1\n"     
                "mov x0, %2\n"
                "mov x1, %3\n"
                "mov x2, %4\n"
                "svc #0\n"
                "mov %w0, w0\n" 
                : "=r" (result)
                : "r" (number), "r" (arg1), "r" (arg2), "r" (arg3)
                : "x0", "x1", "x2", "x16"
            );
            
            return result;
        }

As you can see there’s not a lot of boilerplate code. The structure of the asm volatile block can look a bit cryptic, but in reality it’s quite simple. It’s compose by 4 parts:

raw assembly instructions
output constraints: the variables used to save the output produced by assembly instructions
input constraints: variables used as inputs inside the asm code
clobber list: list of registers used by the assembly instructions

Despite its look, the code is quite trivial: First of all, it puts the first parameter of the macos_arm64_syscall function, number, the number of the syscall, inside the 64bit register x16. The C function parameters are called inside the asm code using % and the index of the parameter starting by one, as defined in the input constraints line.

Then, following the XNU calling convention, it puts the other parameters, arg1, arg2 and arg3, inside the 64bit registers x0, x1 and x2. At this point it calls svc #0, which raises an interrupts that puts the kernel in control of the execution code, calling the selected system call. It’s the equivalent of syscall opcode or int 0x80.

The XNU kernel is not really well documented compared to Linux, but you can find a table of the available system calls here: MacOS syscall numbers.

At this point we get the result value of the syscall from the w0 register (which is just the first 32 bits of the x0, because it’s a integer) and we save it into %0, which is the variable we declared as output in the output constraints.

That’s it! With those lines we can call any MacOS syscall natively, eliminating the need of unistd and fcntl. Here’s the implementation of open, read and write that use our syscall function:

int open(const char *path, int flags, int mode){
    return macos_arm64_syscall(5, (long)path, flags, mode);
}

int read(int file_descriptor, void *buffer, int bytes){
    return macos_arm64_syscall(3, file_descriptor, (long)buffer, bytes);
}

int write(int file_descriptor, void *buffer, int bytes){
    return macos_arm64_syscall(4, file_descriptor, (long)buffer, bytes);
}

… and into the penguin

At this point, my goal was accomplished. Well, if we don’t consider stuff like error handling and all those funny and unnecessary stuff. But this worked only on my Mac. I wanted to make it work also on my linux machine!

After a few curses at the ATT syntax, which I hate, after finding the linux syscalls numbers (this seems like a good source), and after checking the calling convention, I managed to write the same thing, but for linux:

int linux_x64_syscall(long number, long arg1, long arg2, long arg3){
        long result;

        asm volatile (
            "movq %1, %%rax\n"
            "movq %2, %%rdi\n"
            "movq %3, %%rsi\n"
            "movq %4, %%rdx\n"
            "syscall\n"
            "movq %%rax, %0\n"
            : "=r" (result)
            : "r" (number), "r" (arg1), "r" (arg2), "r" (arg3)
            : "rax", "rdi", "rsi", "rdx"
        );
        return (int)result;
    }

The linux calling convention is that the first 6 arguments are passed via the following registers: rdi, rsi, rdx, rcx, r8, r9. After that, arguments 7 and above are pushed into the stack. The code above puts the system call number in rax, the other 3 arguments into rdi, rsi and rdx, and then calls syscall to make the kernel execute what we want, exactly like we did earlier with macos.

Inside each function (open, read, write) I added some logic to distinguish between platforms using preprocessor directives:

int open(const char *path, int flags, int mode){
    #ifdef __APPLE__
        #ifdef __aarch64__
            return macos_arm64_syscall(5, (long)path, flags, mode);
        #endif
    #elif __linux__
        #ifdef __x86_64__
            return linux_x64_syscall(2, (long)path, flags, mode); 
        #endif
    #endif
}

The End

Despite all its flaws (no unicode support, atoi that accepts only positive integers and nothing else, zero error handling) this project forced me to strip away layers of astractions one by one, and get closer to how the system actually work.

I really enjoyed questioning the myth that dependencies are fundamental and absolutely essential; on this topic, I can’t avoid to suggest the Build it Yourself post of Armin Ronacher and the related video on Salvatore Sanfilippo. These two pieces of contents are what prompted me to start this project.

If you are interested to see the complete code of gnirt, you can check out its github page. If you have any suggestion of any kind, please open an issue, submit a pull request or contact me via mail.

Thanks for reading!