Friday, June 30, 2017

Go Back to Previous Directory in Bash

Wow, I can't believe I wasn't aware of these super useful commands till now!

The following command will change back to your previous directory
$ cd -

The following command will push the current directory to stack and move to the designated directory
$ pushd /new/directory/to/go/to

The following directory will pop from the stack and move to that directory
$ popd

Tuesday, June 20, 2017

Java Comparable and Comparator

In this tutorial, I will go over two similar but very different interfaces in Java: comparable and comparator. For official documentations on these, refer to here and here.

Comparable and Comparator are both Java interfaces to compare two or more objects. The difference is that comparable is the default comparator of the object, whereas comparator is a custom comparison implementation.

For example, assume you are entering personal info. For each personal info object, you have one's name, year, month, and date of birth. One way to sort a list of personal info objects is by their names, but another way is to sort by the date of birth. In Java, one will be able to sort by name or date of birth by providing corresponding comparator to the class. On the other hand, if one is to simply sort the objects without any comparator provided, then the list will be sorted by the class' comparable interface.

Let's take a look at an example.

To run it, simply type in
$ javac && java Personal

Sorting by default
Bob 1999, 1, 2
Charles 1955, 7, 4
Mike 2011, 4, 1
Tom 2011, 4, 4
Tony 2000, 5, 5
Tony 1987, 12, 11

Sorting by date of birth
Charles 1955, 7, 4
Tony 1987, 12, 11
Bob 1999, 1, 2
Tony 2000, 5, 5
Mike 2011, 4, 1
Tom 2011, 4, 4

Sorting by name
Bob 1999, 1, 2
Charles 1955, 7, 4
Mike 2011, 4, 1
Tom 2011, 4, 4
Tony 1987, 12, 11
Tony 2000, 5, 5

In the first sorting, no comparator is provided, so the list is sorted by Personal's comparable interface, namely compareTo method. For the second and third sorting, DobComparator and NameComparator is provided, so the list is sorted by the date of birth and name, respectively. The comparator classes provide compare method which specifies the order in which the list is to be sorted.

Sunday, June 18, 2017

Analysis of Quicksort with Duplicate Keys

In this post, I will present two slightly different versions of quicksort, and examine why one is better than the other.

Let's dig in the code first.

As you can see, the difference between two versions is very subtle. One partitions into two where the left contains values less or equal to pivot key and the right contains values strictly larger. The second version partitions into two where the left contains values less or equal to pivot key and the right contains values greater or equal to the pivot key.

What difference will this make? For an array with distinct keys, the difference is not as significant. However, when we have lots of distinct keys, the story changes. The first one performs extremely poorly when there are many identical keys, whereas the second one performs faster as the number of duplicate keys increases.

$ g++ qsort_analysis.cpp -g -ltictoc -O0 && ./a.out 10000000 10000000
Time elapsed: 2.067320s
Time elapsed: 1.842697s

$ g++ qsort_analysis.cpp -g -ltictoc -O0 && ./a.out 10000000 10000
Time elapsed: 10.985325s
Time elapsed: 1.441884s

How does such subtle difference make such large difference? We can analyze this by looking at an array with all the same keys and observe how many calls each quicksort is called in total.

$ g++ qsort_analysis.cpp -g -ltictoc -O3 -DDEBUG && ./a.out 16 1
#1 quicksort1 with 0,15
#2 quicksort1 with 0,14
#3 quicksort1 with 0,13
#4 quicksort1 with 0,12
#5 quicksort1 with 0,11
#6 quicksort1 with 0,10
#7 quicksort1 with 0,9
#8 quicksort1 with 0,8
#9 quicksort1 with 0,7
#10 quicksort1 with 0,6
#11 quicksort1 with 0,5
#12 quicksort1 with 0,4
#13 quicksort1 with 0,3
#14 quicksort1 with 0,2
#15 quicksort1 with 0,1
#16 quicksort1 with 0,0
#17 quicksort1 with 2,1
#18 quicksort1 with 3,2
#19 quicksort1 with 4,3
#20 quicksort1 with 5,4
#21 quicksort1 with 6,5
#22 quicksort1 with 7,6
#23 quicksort1 with 8,7
#24 quicksort1 with 9,8
#25 quicksort1 with 10,9
#26 quicksort1 with 11,10
#27 quicksort1 with 12,11
#28 quicksort1 with 13,12
#29 quicksort1 with 14,13
#30 quicksort1 with 15,14
#31 quicksort1 with 16,15
Time elapsed: 0.000057s
#1 quicksort2 with 0,15
#2 quicksort2 with 0,7
#3 quicksort2 with 0,3
#4 quicksort2 with 0,1
#5 quicksort2 with 0,0
#6 quicksort2 with 2,1
#7 quicksort2 with 3,3
#8 quicksort2 with 5,7
#9 quicksort2 with 5,5
#10 quicksort2 with 7,7
#11 quicksort2 with 9,15
#12 quicksort2 with 9,11
#13 quicksort2 with 9,9
#14 quicksort2 with 11,11
#15 quicksort2 with 13,15
#16 quicksort2 with 13,13
#17 quicksort2 with 15,15
Time elapsed: 0.000020s

As you can see above, the first version of quicksort is invoked about 2*N times where the second version is invoked about N times for given array of N elements all of equal keys. The reason is that for the first version, each partitioning will be extremely skewed over to the right, whereas for the second version the partition is always at the center, thus leading to much more efficiency.

Saturday, June 17, 2017

Variations in Quicksort

In this post, I would like to go over three different versions of quicksort and compare their performance.

Let's dig into code first.

There are three different version of quicksort. The first is regular binary quicksort, the second is ternary quicksort, and the last is 3-way quicksort. Note that tictoc.h is my custom library; for more details, see this post.

The binary quicksort is what everyone is most familiar with. The left partition contains those less or equal to pivot value, whereas the right partition contains those who are greater or equal to the pivot value.

The ternary quicksort is similar in that it divides into three partitions and two pivot values. The left-most partition contains those less or equal to the first pivot value, the middle partition contains those in between two pivot values, and the right-most partition contains those greater than the second pivot value. Note that I am assuming the first pivot value is less or equal to the second pivot value.

Lastly, the 3-way quicksort partitions the array into 3, where the first is strictly less, the middle is equal to, and the last is strictly greater.

Let's examine their performance. First, when the array values are mostly distinct:
$ g++ -g quicksort.cpp -ltictoc -O3 && ./a.out 1000000000
Time elapsed: 0.983398s
Time elapsed: 0.883001s
Time elapsed: 1.151062s

It seems like ternary quicksort is the fastest among these.

When the array values are less distinct, however, we observe that 3-way quicksort performance starts to take off, while ternary quicksort slows downs greatly.

$ g++ -g quicksort.cpp -ltictoc -O3 && ./a.out 100000
Time elapsed: 0.808114s
Time elapsed: 1.043392s
Time elapsed: 0.829712s

$ g++ -g quicksort.cpp -ltictoc -O3 && ./a.out 1000
Time elapsed: 0.612664s
Time elapsed: 27.944554s
Time elapsed: 0.427633s

We will examine what is going on as we introduce more duplicate key values in the next post.

Assertion: Great Practice for Software Development

In this post, I will discuss recommended usage of assertion in programming. For both C/C++/Java, assert is a great way to alert a programmer that there is something against his expectation.

Let's do a quick example. Say you are implementing quick sort. In particular, you are trying to validate your partition method. You can insert an assertion to make sure that your partition method is functioning as expected, as below:

Well, you must agree that assertion is a very great tool for debugging, as it tells you exactly where it could have gone wrong. Unfortunately, assertion will slow down the performance and is not needed for production code, so do we need to comment them out when releasing the code for production?

In fact, both C/C++ and Java has a nice way to disable and enable the entire assertion statements. For C/C++, you simply need to define a macro NDEBUG, which will disable all assertions. Otherwise, assertions are enabled by default, unless of course you define NDEBUG macro in your source files explicitly. Therefore, you need disable assertions at compile time for production binary:

$ g++ *.cpp -DNDEBUG

Here, -D is the option for implicit #define macro, so the option -DNDEBUG will define NDEBUG macro for all files.

For Java, there is -da and -ea options which will disable and enable assertions, respectively at the runtime. In fact, assertions are by default disabled in Java, so you need to enable with -ea option when debugging, as below:

$ java -ea SomeClass

For more info, please refer to this and this official documentations.

Happy coding!

Monday, June 12, 2017

Setting up Environment for Coursera's Algorithm Class

Yes, I am taking the very basic algorithm class from Coursera. In fact, this class is very good. It is actually a course from Princeton, and I really like the course, except that they use Java, which I am not as familiar with compared to C/C++.

On top of that, the author provides custom jar library which I must use for compiling and running Java projects. Although the instructions are written very clearly on the official website, I still want to go over this bit, because I spent quite some time figuring this out.

In order to compile Java with a custom library, one must specify the library's path with -classpath flag as below:
$ javac -classpath /path/to/custom/library.jar -g

Also, it is handy to include -g flag so that one can debug it later. If you are compiling a program that uses algs4.jar library file provided by the author, you must download the file and specify it with -classpath flag.

In order to run the program, you need to do the same thing, but with the path to source files as well:
$ java -classpath /path/to/custom/library.jar:/directory/to/source/files SomeClass 

Finally, to debug, you need to do the same thing:
$ jdb -classpath /path/to/custom/library.jar:/directory/to/source/files SomeClass

Let's do an example. Let's assume that you have downloaded algs4.jar file to ~/Downloads folder. Say you have created file in ~/Documents folder.

You will need to run the following commands to compile, run, and debug, respectively:
$ javac -classpath ~/Downloads/algs4.jar ~/Documents/ -g
$ java -classpath ~/Downloads/algs4.jar:~/Documents Queue
$ jdb -classpath ~/Downloads/algs4.jar:~/Documents Queue

Happy coding algorithms!

Sunday, June 11, 2017

GDB on Mac OS X (Safe Method)

The officially supported debugging package for Mac OS X is lldb, which is a fine debugger. However, many people including me still prefer to use gdb, the GNU debugger. In fact, I use cgdb, which provides nice color interface to gdb, and that is why I must install gdb on my Mac. Unfortunately, it is not so easy to install and enable gdb to debug on Mac. In the previous post, I presented a method to do this, but I figure now that it was quite unsafe way to do so. For more details, please take a look at the official documentation.

Here is a better way of enabling gdb on your Mac. First, if you haven't installed gdb, do so.
$ brew install gdb

If you are using Sierra or above, run the following as well:
$ echo "set startup-with-shell off" >> ~/.gdbinit

Next, this is the part where we give gdb debug other processes. Open up Keychain Access application and on the menu select Keychain Access -> Certificate Assistant -> Create a certificate.

Enter gdb-cert for the name, select Self Signed Root for Identity Type, select Code Signing for Certificate Type, and check the box Let me override defaults.

Click Continue several times until you see Specify a Location For the Certificate window. Select System for the Keychain and proceed.

Once the certificate is created, double click on it, and in the Trust section locate Code Signing item. Select Always Trust and exit. You will be prompted with your admin password to make the change.

Finally, you can close Keychain Access app and type in the following in terminal:
$ sudo killall taskgated
$ codesign -fs gdb-cert $(which gdb)

That's it! You will be able to use gdb on your Mac!