Saint's Log – Page 15

2017-09-30 —Categories: Big Data

Setting up Apache Hadoop

As part of my Dynamic Big Data course, I have to set up a distributed file system to experiment with various mapreduce concepts. Let’s use Hadoop since it’s widely adopted. Thankfully, there are instructions on how to set up Apache Hadoop – we’re starting with a single cluster for now.

I guess we can include VM setup instructions for completeness (for those that don’t already have a Linux environment).

Download and install VirtualBox then grab a copy of the Ubuntu desktop ISO.
Create a new VM in VirtualBox. I elected to use the default 1024MB memory allocation and 10.00GB hard disk. While there may be various reasons for choosing various Hard disk file types, let’s proceed with the default VDI format and have the space be dynamically allocated. Let’s set the size of the virtual hard disk to 256GB as well (“the limit on the amount of file data that a VM will be able to store on the hard disk”).
Select the VM in the left pane, click on the “[Optical Drive] Empty” label in the Storage group on the right pane and select the “Choose Disk Image…” menu option. Select the downloaded Ubuntu ISO (ubuntu-16.04.3-desktop-amd64.iso in my case).
Start the virtual machine. Once Ubuntu boots up, you will be prompted to try it or install it. Let’s install it. Be sure to check the Download updates while installing Ubuntu option so that we don’t need to install updates later.
Since there are no OSs on the VM, proceed with the Erase disk and install Ubuntu option. You may need to pick a username and password before the installation will finally begin!

Once installation is complete, log onto the Ubuntu OS. Set up shared folders and enable the bidirectional clipboard as follows:

From the VirtualBox Devices menu, choose Insert Guest Additions CD image… A prompt will be displayed stating that “VBOXADDITIONS_5.1.26_117224” contains software intended to be automatically started. Just click on the Run button to continue and enter the root password. When the guest additions installer completes, press Return to close the window when prompted.
From the VirtualBox Devices menu, choose Shared Clipboard > Bidirectional. This enables two way clipboard functionality between the guest and host.
From the VirtualBox Devices menu, choose Shared Folders > Shared Folders Settings… Click on the add Shared Folder button and enter a path to a folder on the host that you would like to be shared. Optionally select Auto-mount and Make Permanent.
Open a terminal window. Enter these commands to mount the shared folder (assuming you named it vmshare in step 3 above):

mkdir ~/vmshare
sudo mount -t vboxsf -o uid=$UID,gid=$(id -g) vmshare ~/vmshare

To start installing the software we need, enter these commands:

sudo apt-get update
sudo apt install default-jdk

Next, get a copy of the Hadoop binaries from an Apache download mirror.

cd ~/Downloads
wget http://apache.cs.utah.edu/hadoop/common/hadoop-2.7.4/hadoop-2.7.4.tar.gz
mkdir ~/hadoop-2.7.4
tar xvzf hadoop-2.7.4.tar.gz -C ~/
cd ~/hadoop-2.7.4

The Apache Single Node Cluster Tutorial says to

export JAVA_HOME=/usr/java/latest

in the etc/hadoop/hadoop-env.sh script. On this Ubuntu setup, we end up needing to

export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/

If you skip setting up this export, running bin/hadoop will give this error:

Error: JAVA_HOME is not set and could not be found.

Note: I found that setting JAVA_HOME=/usr caused subsequent processes (like generating Eclipse projects from the source using mvn) to fail even though the steps in the tutorial worked just fine.

To verify that Hadoop is now configured and ready to run (in a non-distributed mode as a single Java process), execute the commands listed in the tutorial.

$ mkdir input
$ cp etc/hadoop/*.xml input
$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.4.jar grep input output 'dfs[a-z.]+'
$ cat output/*

The bin/hadoop jar command runs the code in the .jar file, specifically the code in Grep.java, passing it the last 3 arguments. The output should resemble this summary:

If you’re interested in the details of this example (e.g. to inspect Grep.java), examine the src subfolder. If you don’t need the binaries and just want to look at the code, you can wget it from a download mirror, e.g.:

wget http://apache.cs.utah.edu/hadoop/common/stable/hadoop-2.7.4-src.tar.gz

2017-09-11 —Categories: C++

Common C++ Standard Library Compiler Errors (by Rusty C++ Coders)

The first programming assignment in the Operating Systems course can be a challenge for students that haven’t written C++ code in a while. While working with the std::queue data structure in C++, it’s easy to make certain types of mistakes (especially if C++ isn’t your native tongue):

Not “using namespace std” when using standard library containers. This can result in some ugly error messages in Visual Studio, e.g. error/warning codes C2143, C4430, and C2238 for the class member array below (is there a better way for students/developers to find out what is happening when they make such a trivial mistake)?
Not understanding the assignment operator semantics on a container like a queue. If we write queue<type> myqueue = array[i]; we get a copy of the queue array[i] (we might have simply wanted a reference/alias). For such a mistake, the code obviously compiles but doesn’t function as intended.
Declaring a fixed-sized data structure to hold all values from a variable-sized container! Runtime errors take care of informing students about this bug (if they’re not lucky enough to have almost empty variable-size containers). The correct declaration of dynamic arrays of templated items is also not usually obvious: T* all_elements = new T[dynamic_integer_size];

2017-09-11 —Categories: Windows

Fighting Windows Update Failures

For the past month or so, I have been unable to update my 7.5 year-old machine from Windows 10 Build 14393 because “we couldn’t connect to the update service”! Some folks online suggested trying the “Fix problems with Windows Update” wizard. Unfortunately (or fortunately), this Wizard identified a Potential Windows Update Database Error – but it couldn’t fix it!

The next idea was trying the system file checker as mentioned here. This tool did not find any issues on my filesystem. The DISM.exe commands seemed irrelevant so I took a quick peek at the Event Viewer. Lo and behold, a long list of WindowsUpdateFailure3 events. The supposed cure? These commands suggested in this thread (found by Googling “WindowsUpdateFailure3”).

net stop wuauserv
net stop cryptSvc
net stop bits
net stop msiserver
ren C:\Windows\SoftwareDistribution SoftwareDistribution.old
ren C:\Windows\System32\catroot2 Catroot2.old
net start wuauserv
net start cryptSvc
net start bits
net start msiserver

These didn’t solve the problem either. Digging around in log files led me to error code 0x80240438, but then again, no useful hints from queries on this code. The registry keys mentioned didn’t even exist on my machine. OK Google. Now there is a list of connection error codes, but of course it doesn’t include the one from the update failure log.

Desperation: what else haven’t I tried? This page with connection error codes mentions ensuring that http://*.update.microsoft.com is reachable. http://update.microsoft.com redirects to a page describing how to get the Windows 10 Creators Update now. Aha! I end up downloading and running the Windows 10 Upgrade tool. The update hangs (or appears to, after a couple of hours) at 25% so I simply hit the reset button hoping for the best. Sure enough, I end up back at build 14393 after a while. Desperation led me to try this tool again yesterday. I was encouraged to see that it actually downloaded an update to the Windows 10 Upgrade tool. When I popped in after rebooting to update, it was still at 25% but I decided to let it run overnight. Quite pleased was I this morning to come and find a Welcome screen prompting me to select my telemetry, etc, settings. I’m now finally running an up to date OS!

2017-09-10 —Categories: Electrical Engineering

Common Verilog Mistakes by SystemVerilog Newbies

As part of my HDL Digital Design course at the University of Wyoming, I have to implement various modules using Verilog. Interestingly, the course textbook (Digital Design and Computer Architecture) uses SystemVerilog. This leaves an HDL newbie like me in an interesting spot since I tend to make assumptions about Verilog based on what I’ve seen in SystemVerilog, even though the latter is newer. Some common newbie issues I ran into (that were easily resolved by finding examples of Verilog modules):

Declaring inputs as “input xyz[3:0]” instead of “input [3:0]xyz”.
Trying to use assertions. It was quite surprising to me that assertions as explained in SystemVerilog weren’t “a thing” in Verilog. Some digging around led me to this article discussing assertions (and implying there wasn’t a direct way to assert facts in Verilog).
Concatenating values
No assertions? What… oh, I guess I already mentioned that, but I’m new to this HDL arena.

I guess I could have been more diligent in my search for Verilog tutorials for beginners (like this one) to alleviate all the annoyances I ran into assuming that learning SystemVerilog would translate in a completely seamless transition into writing Verilog code. Oh well, I guess learning a new language doesn’t always involve taking the most efficient path from A to B.

2017-09-05 —Categories: Electrical Engineering

Resolving iverilog __gxx_personality_v0 errors

I ran into a strange error while trying to compile a Verilog module using the Icarus Verilog compiler:

A quick lookup of the __gxx_personality_v0 error on StackOverflow revealed to me that the cause of this strange error is a mismatched/missing libstdc++-6.dll, fixable with a simple command:

copy C:\dev\tools\iverilog\bin\libstdc++-6.dll C:\dev\tools\iverilog\lib\ivl

I still need to investigate exactly why this libstdc++-6.dll problem occurs. It might be related to my Git environment (Git Bash?)

2012-01-27 —Categories: Theorem Proving

Installing Isabelle 2011

I’ve spent the past few days making a few skills strides – I finally buckled down and learned how to use Emacs. The tutorial felt somewhat long but was worthwhile. Having discovered how easily I can set margins and have my text automatically wrap at 80 characters has made me a convert! Interesting since Emacs is a program I simply never thought was worth the time until I decided to try out the Isabelle theorem prover. I grabbed the 32-bit Linux bundle, ran the simple commands listed on the download page, and was up and running.

tar -xzf Isabelle2011-1_bundle_x86-linux.tar.gz
Isabelle2011-1/bin/isabelle emacs

Having completed my Emacs tutorial, I then worked through the Basic Script Management portion of the Proof General User Manual. These tasks were sufficient to prepare me for my task of analysing some proofs from The Archive of Formal Proofs.

2011-11-11 —Categories: Regular Expressions

Simple Regex for Finding C++ Blocks

Not particularly accurate (obviously) but sufficient for what I needed:

(?s:\{[^{]+\})

Enables single line matching mode (that matches line termination characters as well), followed by the left curly brace, followed by all subsequent characters that aren’t the right curly brace, and finally ending with a curly brace.

2011-10-07 —Categories: Mozilla Development

My Resolved Mozilla Bugs

Before starting my masters, I worked on a couple of Mozilla/Gecko items on Bugzilla. Here is a list of all tasks I tackled.

2011-09-17 —Categories: Valgrind

Quick Introduction to Valgrind

For my Computer Aided Geometric Design course, a simple program called CPLOT is used for some of the project work. Its documentation is on the course website. After my initial draft of my project 1 implementation crashed, I was inspired to try out Valgrind on CPLOT (even though I was able to find the bugs using my debugger). All that’s needed to install Valgrind on Ubuntu is sudo apt-get install valgrind. The CPLOT code supplied needed a minor change in order to compile, so I changed

#include <string.h>;

into

#include <string.h>;

and all was well with the world again. The updated CPLOT code is available in my public repo. Line 1 below compiles the code. The Valgrind Quick Start Guide recommends using the -g flag to produce debugging information so that Memcheck’s error messages include exact line numbers.

g++ -g -o cplot.out cplot.cpp
valgrind --leak-check=yes ./cplot.out eg1.dat eg1.eps

The eg1.dat file I used is the one on page 3 of the CPLOT documentation. Below is the output from valgrind:

==1594== Memcheck, a memory error detector
==1594== Copyright (C) 2002-2010, and GNU GPL'd, by Julian Seward et al.
==1594== Using Valgrind-3.6.1 and LibVEX; rerun with -h for copyright info
==1594== Command: ./cplot.out eg1.dat eg1.eps
==1594==
In initps: eg1.eps
File processed successfully!
==1594==
==1594== HEAP SUMMARY:
==1594==     in use at exit: 120 bytes in 6 blocks
==1594==   total heap usage: 8 allocs, 2 frees, 824 bytes allocated
==1594==
==1594== 120 (8 direct, 112 indirect) bytes in 1 blocks are definitely lost in loss record 3 of 3
==1594==    at 0x402641D: operator new(unsigned int) (vg_replace_malloc.c:255)
==1594==    by 0x8049224: readCurve() (cplot.cpp:182)
==1594==    by 0x8048D89: main (cplot.cpp:127)
==1594==
==1594== LEAK SUMMARY:
==1594==    definitely lost: 8 bytes in 1 blocks
==1594==    indirectly lost: 112 bytes in 5 blocks
==1594==      possibly lost: 0 bytes in 0 blocks
==1594==    still reachable: 0 bytes in 0 blocks
==1594==         suppressed: 0 bytes in 0 blocks
==1594==
==1594== For counts of detected and suppressed errors, rerun with: -v
==1594== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 17 from 6)

This program can be improved by changing the return statements in the body of the main for(;;) loop into break statements. By so doing, all the cleanup code can be placed after that loop, just before the program exits. This also prevents the duplication of the cleanup code. The code after the for(;;) loop then becomes:

    // Deallocate all allocated memory
    for (int i=0; i < NCURVES; i++) {
        delete curves[i];
    }

    // Close the file resources
    fclose(in);
    fclose(ps);
    return 0;

The Curve class destructor then becomes:

Curve::~Curve() {
    for (int j=0; j <= degree; j++)
        delete points[j];

    delete [] points;
};

Running Valgrind on the updated program then gives the following output.

==6539== Memcheck, a memory error detector
==6539== Copyright (C) 2002-2010, and GNU GPL'd, by Julian Seward et al.
==6539== Using Valgrind-3.6.1 and LibVEX; rerun with -h for copyright info
==6539== Command: ./cplot.out eg1.dat eg1.eps
==6539==
In initps: eg1.eps
File processed successfully!
==6539==
==6539== HEAP SUMMARY:
==6539==     in use at exit: 0 bytes in 0 blocks
==6539==   total heap usage: 8 allocs, 8 frees, 824 bytes allocated
==6539==
==6539== All heap blocks were freed -- no leaks are possible
==6539==
==6539== For counts of detected and suppressed errors, rerun with: -v
==6539== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 17 from 6)

There was another bug in the program since it could attempt to plot control polygons for uninitialized curves because of a missing else. I pushed the trivial fix to that. The program also crashed if the input file specified did not exist because fscanf would try to use a NULL parameter. The fix is a simple NULL check.

2011-09-06 —Categories: Parallel Computing

Modelling the Infamous Dining Philosophers

A sample implementation of the dining philosophers problem is provided in the JPF core source code. The solution I gave to this problem was to use the following locking scheme: if a thread has an even id i, then it should lock fork i first and then fork (i + 1) % N. If its id i is odd, then it should lock fork (i + 1) % N first and then lock fork i. Replacing

new Philosopher(forks[i], forks[(i + 1) % N]);

with

Fork leftFork, rightFork;
if (i % 2 == 0) {
    leftFork = forks[i];
    rightFork = forks[(i + 1) % N];
} else {
    leftFork = forks[(i + 1) % N];
    rightFork = forks[i];
}
new Philosopher(leftFork, rightFork);

Should do the trick. DiningPhil.jpf can then be launched to verify the absence of deadlock with this new locking scheme.