Saturday, August 27, 2011

Compressing file content from Hadoop file system to local file system through gzip

If you ever need to compress a file you have on a Hadoop file system (HDFS) to local file system, one way this could be achieved is through linux piping. User can “cat” the file to the output and the “gzip” command can read that content through STDIN. The following command demonstrates that point:

$ hadoop fs -cat /user/amgad/file1.txt | gzip > file1.gz

The previous command reads in the file “file1.txt” from hadoop and compresses the content to “file1.gz”

Fixing Eclipse when hanging on start up

If you start eclipse and the splash screen hangs forever, you may find this tip useful to get running again without recreating your eclipse installation and workspace
$ rm -fr .metadata/.plugins/org.eclipse.core.resources/.snap
From the command line , go to the workspace directory and issue the previous command. What it does is basically remove the last snapshot eclipse had before you closed it incorrectly. This snapshot could be corrupted and could be causing the start-up-hang problem.

How to iterate into multiple paths on Linux using wildcards

The following is a list of the wild cards you can use to help you iterate through linux paths for multiple items at once. They can be very handy if you want to pass a set of paths contained within a certain path in addition to other use cases:
  • *  : zero or more characters
  • ?  : exactly one character
  • [abcde] : exactly one character listed
  • [a-e] : exactly one character in the given range
  • [!abcde] : any character that is not listed
  • [!a-e] : any character that is not in the given range
  • {debian,linux} : exactly one entire word in the options given
For example:
 $ cat /home/user/data/{old,new}/* > output.txt
Selects data from both “old” and “new” directory and send the output to “output.txt”
 $ cat /home/user/data/2011060[1-7]/* > output.txt
Selects data from the first till the 7th of June only and send the output to “output.txt”

How to resolve the Hadoop Map Reduce Out of Memory Exception

If you ever face an out of memory exception while running a hadoop job on your cluster, consider using the following parameters in your command line :

-Dmapred.map.child.java.opts=-Xmx1G -Dmapred.reduce.child.java.opts=-Xmx1G

Those parameters allow expanding the map and reduce memory limit to 1GB. Those setting can also be specified in the mapred.reduce.child.java.opts file if you have access to it so as to avoid writing them each time you run a hadoop command

How to Import table data into mysql through command line

Some GUI tools do not provide the luxury of importing a large database file into a mysql table. Mysql provides a flexible way of importing data into a specific mysql table through the mysqlimport command. The following command will import the data into a table called businessinfo. Note that mysqlimport will import the data from file into the table that has the same name as the file name (without the extension).

mysqlimport -u root -p —local sampledatabase businessinfo.txt

The input to the command is
  • -u username : for indicating the user
  • -p : required before running the command
  • sampledatabase: the database name
  • businessinfo.txt : the text file that contains the data.

Top 20 ad Keywords that bring Google money

Ever wondered what is the top 20 keyword that bring Google its $3 billion a months from Adwords ? If you are wondering what Adwords are , its those small text ads next to the search results. In #1 which comes as a surprise to me personally is “Insurance” accounting for 24% . Yes , people interested in insurance bring Google the most revenue in terms of ads. This actually shows how human beings are becoming more and more skeptical and paranoid about all sort of bad things happening in their lives. This statistic merely shows how much money Google is making out of this, so this kind of makes you wonder how much does insurance companies make in return ! The rest of the keywords follow the same trend where #2 is “Loans” which accounts for 12.8% and #3 is “Mortage” accounting for 9% of the ads. Visit the link for a complete graph of the top 20 keyword categories.

Link

The Internet Makes us Dumber

An interesting point of view regarding the everyday tools we use on the internet and how they affect us. An example was given about twitter were people no longer care about details , just the headlines. The main take-home message here is we should rely on in-depth reading in order to learn.

Link

How Yelp Crushed Citysearch & Yahoo Local … & Why Google Is Stealing Yelp’s Playbook

A very interesting study on how social aspects on the web affects a whole product , even if it provides good functionality. The interesting thing about Google these days is that its copying the successful models instead of innovating new ways to present their products. I believe one case is with Google+ compared to Facebook. In my opinion however, this may turn out ok for Google since they still rely on integrating their products together which may turn out to its advantage.

Link

How to Upgrade Eclipse

One of the most daunting tasks that faced alot of eclipse developers in the past was how to upgrade eclipse from one release to the next. Most people tend to backup the old eclipse directory and migrate their settings and “hope” nothing breaks. Fortunately there is an easier step to upgrade using the update feature of eclipse. The following steps illustrate how to upgrade Eclipse Helios 3.6 to Indigo 3.7
  1. Go to “Install New Software” option in Help
  2. Add the new release URL (ex: http://download.eclipse.org/releases/<releasename>)
  3. Close the window
  4. Go to “Check Updates” option in Help
  5. You will be presented with a series of updates that correspond to the new URL you just added.
References

Friday, August 26, 2011

Muhammed ( Prayers Be Upon Him ) Quote

Four things support the world: the learning of the wise, the justice of the great, the prayers of the good, and the valor of the brave