Hadoop (CDH3) Quick Start Guide - Documentati...

来源:百度文库 编辑:神马文学网 时间:2024/07/01 13:45:52

To install CDH3 on a single Linux node in pseudo-distributed mode:

  1. Use a supported Linux version. The steps in this section assume you’re installing CDH3 on a supported Linux version.

  2. Install the JDK. Important
    You must install the Java Development Kit (JDK) before installing Cloudera's RPM or Debian packages. Hadoop requires JDK 1.6, update 8 at a minimum, which you can download from the Java SE Homepage. You may be able to install the JDK with your package manager, depending on your choice of operating system. For JDK installation instructions, see Java Development Kit Installation.


  3. Install Cloudera’s yum or apt respository.
    A repository (yum for Red Hat systems or apt for Debian systems) enables your package manager to install the Cloudera packages very easily.

    On a Debian system, add the following lines to /etc/apt/sources.list.d/cloudera.list (a new file). Replace DISTRO in the example below with the name of your distribution (jaunty or karmic, for example), which you can find by running lsb_release -c:

    deb http://archive.cloudera.com/debian DISTRO-cdh3 contrib
    deb-src http://archive.cloudera.com/debian DISTRO-cdh3 contrib


    On a Debian system, run the following commands. Note you may have to install curl by using the sudo apt-get -y install curl command:

    curl -s http://archive.cloudera.com/debian/archive.key | sudo apt-key add -    sudo apt-get update

     


    On Red Hat system, run the following commands as the root user (or with sudo):

    curl http://archive.cloudera.com/redhat/cdh/cloudera-cdh3.repo > /etc/yum.repos.d/cloudera-cdh3.repo    yum -y update yum

     


  4. Install Hadoop in pseudo-distributed mode.
    A pseudo-distributed Hadoop installation is composed of one node running all five Hadoop daemons: namenode, jobtracker, secondarynamenode, datanode, and tasktracker. The following commands will install Hadoop 0.20, along with the appropriate pseudo-distributed configuration:

    On a Debian system, run the following command:

    sudo apt-get -y install hadoop-0.20-conf-pseudo

     


    On a Red Hat system, run the following command as the root user (or with sudo):

    yum -y install hadoop-0.20-conf-pseudo

     


  5. Start the Daemons.
    Run the following command:

    for service in /etc/init.d/hadoop-0.20-*; do sudo $service start; done

     


  6. Confirm Hadoop is working by performing some operations and running a job.

    For example, try performing some DFS operations:

    # hadoop fs -mkdir /foo    # hadoop fs -ls /    Found 2 items    drwxr-xr-x    - root   supergroup   0 2009-12-10 19:11   /foo    drwxr-xr-x    - hadoop supergroup   0 2009-12-10 19:11   /var    # hadoop fs -rmr /foo    Deleted hdfs://localhost/foo    # hadoop fs -ls /    Found 1 items    drwxr-xr-x    - hadoop supergroup   0 2009-12-10 19:11   /var

     


    For example, try to run an example job:

    # hadoop jar /usr/lib/hadoop/hadoop-*-examples.jar pi 2 100000    Number of Maps = 2    Samples per Map = 100000    Wrote input for Map #0    Wrote input for Map #1    Starting Job    [...]    Job Finished in 34.342 seconds    Estimated value of Pi is 3.14280000000000000000000000000

     


    If a value for Pi is displayed, then Hadoop is working. (This particular example computes Pi very simply, which usually yields inaccurate results.)

    Next Steps

  • Learn more about installing and configuring CDH3. See the Installation and Configuration Guide
  • Learn how to deploy CDH3 in fully-distributed mode on a cluster of machines. See Deploying CDH3 in a Cluster.
  • Watch Cloudera’s training videos and work through Cloudera’s exercises to learn how to write your first MapReduce job. See Training videos and exercises.
  • Learn how to quickly and easily start a CDH cluster in EC2 by using Cloudera's CDH EC2 scripts. See CDH Cloud Scripts.
  • Get help from the Cloudera Support team.