Monthly Archives: June 2014

How to Prepare for the Cloudera’s Hadoop Developer Exam (CCD-410)

Cloudera’s Hadoop developer certification is the most popular certification in the Big Data and Hadoop community.  As I’ve recently cleared the CCD-410 exam, I want to take the opportunity to provide few points that helped me with preparing for the exam and more importantly learning Hadoop in a practical way.

Here are these:

  1. Tom White’s Hadoop: The Definitive Guide book is an invaluable companion for you to clear the exam. This may be the only book you need as this will help you to address almost all conceptual questions in the exam. Be sure to grab the 3rd edition (latest till date) that covers YARN.
  2. Don’t overlook the other Apache projects in Hadoop’s ecosystem like Hive, Pig, Oozie, Flume, and HBase. There will be questions testing your basic understanding of those topics.  Refer to the related chapters in the Tom White’s book.  Also, there are always very good YouTube videos and tutorials available on the web.
  3. Understand how to use Sqoop. The best way to start may be to create a simple table in MySQL (or any database you choose) and import the data into HDFS as well as in Hive. Understand the different features of the Sqoop tool. Again, Tom White’s book can be used as well as the Apache Sqoop user guide.
  4. Understand Hadoop fs shell commands to manipulate the files in HDFS.
  5. To clear the exam you need to be hands-on in the basics of MapReduce programming, period. You will find a lot of questions in the CCD-410 exam asking about the outcome/possible result set based on a given MapReduce code snippet. You need to know and practice is how to convert the common SQL data access patterns into MapReduce paradigm. Also, there will be questions to test your familiarity on key classes used in the driver class and the methods used (for example: Job class and how it is used to submit a Hadoop job)

Tip: Create two simple text files with few records similar to standard emp and dept tables. Load the files into HDFS. Then develop and test your MapReduce programs to produce outputs similar to the following queries:

  • Select empl_no, empl_name from  emp;
  • Select distinct dept_name from dept;
  • Select empl_no, empl_name from emp where salary > 75000;
  • Select empl_name, dept_no, salary from emp order by dept asc, salary desc
  • Select dept_no,count(*) from emp group by dept_name having count(*) > 1 order by dept_name desc;
  • Select  e.empl_name, d.dept_name from emp e join dept d on e.dept_no = d.dept_no;

6. You are expected to understand basic Java programming concepts. This is no sweat for the persons regularly working in the Java environment, but for the rest of us a basic Java refresher course will be very handy. Pay particular attention to the following topics that will be very helpful in writing and understanding MapReduce codes.

  • Regular Expression
  • String handling in Java
  • Arrays processing
  • Collection Framework

7.  Finally, don’t forget to refer to the Cloudera website for the latest updates, study guides and sample questions for the specific certification you are targeting.

Note that you can optionally buy a practice test from Cloudera website. If you have a good preparation and want to self-check your exam readiness you may try this out (Disclaimer: I did it).

I also recommend that you to go through the following article from Mark Gschwind’s BI Blog. The article gives you a solid direction to jumpstart your preparation as well as learning Hadoop.

All the best in your journey to learn Hadoop and get certified! Please share your experience and comments.