CCA-500 Practice Exam Questions with Answers Cloudera Certified Administrator for Apache Hadoop (CCAH) Certification

Question # 6

For each YARN job, the Hadoop framework generates task log file. Where are Hadoop task log files stored?

Cached by the NodeManager managing the job containers, then written to a log directory on the NameNode

Cached in the YARN container running the task, then copied into HDFS on job completion

In HDFS, in the directory of the user who generates the job

On the local disk of the slave mode running the task

Full Access

Question # 7

What two processes must you do if you are running a Hadoop cluster with a single NameNode and six DataNodes, and you want to change a configuration parameter so that it affects all six DataNodes. (Choose two)

You must modify the configuration files on the NameNode only. DataNodes read their configuration from the master nodes

You must modify the configuration files on each of the DataNodes machines

You don’t need to restart any daemon, as they will pick up changes automatically

You must restart the NameNode daemon to apply the changes to the cluster

You must restart all six DatNode daemon to apply the changes to the cluster

Full Access

Question # 8

You need to analyze 60,000,000 images stored in JPEG format, each of which is approximately 25 KB. Because you Hadoop cluster isn’t optimized for storing and processing many small files, you decide to do the following actions:

1. Group the individual images into a set of larger files

2. Use the set of larger files as input for a MapReduce job that processes them directly with python using Hadoop streaming.

Which data serialization system gives the flexibility to do this?

CSV

XML

HTML

Avro

SequenceFiles

JSON

Full Access

Question # 9

You want to understand more about how users browse your public website. For example, you want to know which pages they visit prior to placing an order. You have a server farm of 200 web servers hosting your website. Which is the most efficient process to gather these web server across logs into your Hadoop cluster analysis?

Sample the web server logs web servers and copy them into HDFS using curl

Ingest the server web logs into HDFS using Flume

Channel these clickstreams into Hadoop using Hadoop Streaming

Import all user clicks from your OLTP databases into Hadoop using Sqoop

Write a MapReeeduce job with the web servers for mappers and the Hadoop cluster nodes for reducers

Full Access

Question # 10

In CDH4 and later, which file contains a serialized form of all the directory and files inodes in the filesystem, giving the NameNode a persistent checkpoint of the filesystem metadata?

fstime

VERSION

Fsimage_N (where N reflects transactions up to transaction ID N)

Edits_N-M (where N-M transactions between transaction ID N and transaction ID N)

Full Access

Question # 11

Which YARN process run as “container 0” of a submitted job and is responsible for resource qrequests?

ApplicationManager

JobTracker

ApplicationMaster

JobHistoryServer

ResoureManager

NodeManager

Full Access

Question # 12

On a cluster running MapReduce v2 (MRv2) on YARN, a MapReduce job is given a directory of 10 plain text files as its input directory. Each file is made up of 3 HDFS blocks. How many Mappers will run?

We cannot say; the number of Mappers is determined by the ResourceManager

We cannot say; the number of Mappers is determined by the developer

We cannot say; the number of mappers is determined by the ApplicationMaster

Full Access

Question # 13

Your cluster is configured with HDFS and MapReduce version 2 (MRv2) on YARN. What is the result when you execute: hadoop jar SampleJar MyClass on a client machine?

SampleJar.Jar is sent to the ApplicationMaster which allocates a container for SampleJar.Jar

Sample.jar is placed in a temporary directory in HDFS

SampleJar.jar is sent directly to the ResourceManager

SampleJar.jar is serialized into an XML file which is submitted to the ApplicatoionMaster

Full Access

Question # 14

Your cluster has the following characteristics:

A rack aware topology is configured and on
Replication is set to 3
Cluster block size is set to 64MB

Which describes the file read process when a client application connects into the cluster and requests a 50MB file?

The client queries the NameNode for the locations of the block, and reads all three copies. The first copy to complete transfer to the client is the one the client reads as part of hadoop’s speculative execution framework.

The client queries the NameNode for the locations of the block, and reads from the first location in the list it receives.

The client queries the NameNode for the locations of the block, and reads from a random location in the list it receives to eliminate network I/O loads by balancing which nodes it retrieves data from any given time.

The client queries the NameNode which retrieves the block from the nearest DataNode to the client then passes that block back to the client.

Full Access

Question # 15

You are configuring your cluster to run HDFS and MapReducer v2 (MRv2) on YARN. Which two daemons needs to be installed on your cluster’s master nodes? (Choose two)

HMaster

ResourceManager

TaskManager

JobTracker

NameNode

DataNode

Full Access

Question # 16

You are planning a Hadoop cluster and considering implementing 10 Gigabit Ethernet as the network fabric. Which workloads benefit the most from faster network fabric?

When your workload generates a large amount of output data, significantly larger than the amount of intermediate data

When your workload consumes a large amount of input data, relative to the entire capacity if HDFS

When your workload consists of processor-intensive tasks

When your workload generates a large amount of intermediate data, on the order of the input data itself

Full Access

Question # 17

You have recently converted your Hadoop cluster from a MapReduce 1 (MRv1) architecture to MapReduce 2 (MRv2) on YARN architecture. Your developers are accustomed to specifying map and reduce tasks (resource allocation) tasks when they run jobs: A developer wants to know how specify to reduce tasks when a specific job runs. Which method should you tell that developers to implement?

MapReduce version 2 (MRv2) on YARN abstracts resource allocation away from the idea of “tasks” into memory and virtual cores, thus eliminating the need for a developer to specify the number of reduce tasks, and indeed preventing the developer from specifying the number of reduce tasks.

In YARN, resource allocations is a function of megabytes of memory in multiples of 1024mb. Thus, they should specify the amount of memory resource they need by executing –D mapreduce-reduces.memory-mb-2048

In YARN, the ApplicationMaster is responsible for requesting the resource required for a specific launch. Thus, executing –D yarn.applicationmaster.reduce.tasks=2 will specify that the ApplicationMaster launch two task contains on the worker nodes.

Developers specify reduce tasks in the exact same way for both MapReduce version 1 (MRv1) and MapReduce version 2 (MRv2) on YARN. Thus, executing –D mapreduce.job.reduces-2 will specify reduce tasks.

In YARN, resource allocation is function of virtual cores specified by the ApplicationManager making requests to the NodeManager where a reduce task is handeled by a single container (and thus a single virtual core). Thus, the developer needs to specify the number of virtual cores to the NodeManager by executing –p yarn.nodemanager.cpu-vcores=2

Full Access

Question # 18

You use the hadoop fs –put command to add a file “sales.txt” to HDFS. This file is small enough that it fits into a single block, which is replicated to three nodes in your cluster (with a replication factor of 3). One of the nodes holding this file (a single block) fails. How will the cluster handle the replication of file in this situation?

The file will remain under-replicated until the administrator brings that node back online

The cluster will re-replicate the file the next time the system administrator reboots the NameNode daemon (as long as the file’s replication factor doesn’t fall below)

This will be immediately re-replicated and all other HDFS operations on the cluster will halt until the cluster’s replication values are resorted

The file will be re-replicated automatically after the NameNode determines it is under-replicated based on the block reports it receives from the NameNodes

Full Access

Halloween Special Sale - 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: spcl70

Contact Email:

Crack4sure Logo

Main Navigation

CCA-500 PDF

$33

$109.99

CCA-500 PDF + Testing Engine

$52.8

$175.99

CCA-500 Engine

$39.6

$131.99

CCA-500 Practice Exam Questions with Answers Cloudera Certified Administrator for Apache Hadoop (CCAH) Certification

Answer:

Answer:

Explanation:

Answer:

Explanation:

Answer:

Explanation:

Answer:

Explanation:

Answer:

Answer:

Answer:

Answer:

Answer:

Answer:

Answer:

Answer:

Explanation:

QUICK LINKS

SUPPORT

PAYMENT METHOD

Site Secure

CONTACT US