- Stop all services (including MapReduce) and client applications deployed on HDFS using the instructions provided here. 
- Run the - fsckcommand as instructed below and fix any errors. (The resulting file will contain complete block map of the file system.)- hadoop fsck / -files -blocks -locations > dfs-old-fsck-1.log 
- Use the following instructions to compare the status before and after the upgrade: ![[Note]](../common/images/admon/note.png) - Note - The following commands must be executed by the user running the HDFS service (by default, the user is - hdfs).- Capture the complete namespace of the file system. (The following command does a recursive listing of the root file system. ) - su hdfs hadoop dfs -lsr / > dfs-old-lsr-1.log 
- Run report command to create a list of DataNodes in the cluster. - su hdfs hadoop dfsadmin -report > dfs-old-report-1.log 
- Optionally, copy all or unrecoverable data stored in DFS to a local file system or to a backup instance of DFS. 
- Optionally, repeat the steps 3 (a) through 3 (c) and compare the results with the previous run to ensure the state of the file system remained unchanged. 
 
- As HDFS user, execute the following command to save namespace: - hadoop dfsadmin -saveNamespace 
- Copy the following checkpoint files into a backup directory: - dfs.name.dir/edits 
- dfs.name.dir/image/fsimage 
 
- Stop the HDFS. Ensure all the HDP services in the cluster are completely stopped at this point. 
- If upgrading Hive, ensure that you back up the Hive database. 
- Configure the local repositories. - The standard HDP install fetches the software from a remote yum repository over the Internet. To use this option, you must set up access to the remote repository and have an available Internet connection for each of your hosts. ![[Note]](../common/images/admon/note.png) - Note - If your cluster does not have access to the Internet, or you are creating a large cluster and you want to conserve bandwidth, you can instead provide a local copy of the HDP repository that your hosts can access. For more information, see Deployment Strategies for Data Centers with Firewalls., a separate document in this set. - For each node in your cluster, download the yum repo configuration file - hdp.repo. From a terminal window, type:- For RHEL and CentOS 5 - wget http://public-repo-1.hortonworks.com/HDP-1.2.0/repos/centos5/hdp.repo -O /etc/yum.repos.d/hdp.repo 
- For RHEL and CentOS 6 - wget http://public-repo-1.hortonworks.com/HDP-1.2.0/repos/centos6/hdp.repo -O /etc/yum.repos.d/hdp.repo 
- For SLES 11 - wget http://public-repo-1.hortonworks.com/HDP-1.2.0/repos/suse11/hdp.repo -O /etc/zypp/repos.d/hdp.repo 
 
- Confirm the HDP repository is configured by checking the repo list. - For RHEL/CentOS: - yum repolist 
- For SLES: - zypper repos 
 
 


