githubEdit

Big Data

HDFS and HBase CLI references for Hadoop ecosystem

HDFS

# filesystem operations
hdfs dfs -ls /
hdfs dfs -ls -R /path/to/dir
hdfs dfs -mkdir -p /path/to/dir
hdfs dfs -rm -r /path/to/dir
hdfs dfs -rm -skipTrash /path/to/file

# upload and download
hdfs dfs -put localfile /hdfs/path/
hdfs dfs -put -f localfile /hdfs/path/          # overwrite if exists
hdfs dfs -copyFromLocal localfile /hdfs/path/
hdfs dfs -get /hdfs/path/file localpath
hdfs dfs -copyToLocal /hdfs/path/file localpath
hdfs dfs -getmerge /hdfs/path/dir localfile      # merge files into one

# file operations
hdfs dfs -cat /path/to/file
hdfs dfs -tail /path/to/file
hdfs dfs -head /path/to/file
hdfs dfs -text /path/to/file                     # display compressed file as text
hdfs dfs -cp /src /dst
hdfs dfs -mv /src /dst
hdfs dfs -du -s -h /path/to/dir                  # directory size summary
hdfs dfs -df -h                                  # filesystem disk usage
hdfs dfs -count -q -h /path/to/dir               # file/directory count with quota
hdfs dfs -chmod -R 755 /path/to/dir
hdfs dfs -chown -R user:group /path/to/dir

# file test
hdfs dfs -test -e /path/to/file                  # exists
hdfs dfs -test -d /path/to/dir                   # is directory
hdfs dfs -test -f /path/to/file                  # is file
hdfs dfs -test -z /path/to/file                  # is zero length

# snapshot
hdfs dfs -createSnapshot /path snapshot_name
hdfs dfs -deleteSnapshot /path snapshot_name
hdfs dfs -renameSnapshot /path old_name new_name

# admin operations
hdfs dfsadmin -report                            # cluster status report
hdfs dfsadmin -safemode get                      # check safe mode
hdfs dfsadmin -safemode enter
hdfs dfsadmin -safemode leave
hdfs dfsadmin -refreshNodes                      # refresh datanode list

# block and file check
hdfs fsck /                                      # full filesystem check
hdfs fsck /path/to/file -files -blocks -locations
hdfs fsck / -list-corruptfileblocks

# balancer
hdfs balancer -threshold 10                      # rebalance data across datanodes

# trash
hdfs dfs -expunge                                # empty trash

# quota
hdfs dfsadmin -setSpaceQuota 1T /path/to/dir
hdfs dfsadmin -clrSpaceQuota /path/to/dir
hdfs dfsadmin -setQuota 1000000 /path/to/dir     # set name (file count) quota
hdfs dfsadmin -clrQuota /path/to/dir

HBase

Reference:

Last updated