Coursera Google Data Analytics Professional Data Analysis with R Programming (Week 3) Quiz Answer-Working with data in R.
Recommended Courses:
3.Working with data in R.
Question 1
A data analyst is creating a new data frame. Their dataset has dates, currency, and text strings. What characteristic of data frames is this an instance of?
- Columns should contain the same number of items
- Columns should be named
- Data stored can be many different types
- Variables should be named
A data frame is a collection of columns. Characteristics of data frames include: all columns should be named, data stored can be many different types, and all columns should contain the same number of items. The dataset in question has a variety of data types, which is related to the idea that data stored can be many different types.
Question 2
A data analyst is considering using tibbles instead of basic data frames. What are some of the limitations of tibbles? Select all that apply.
- Tibbles can never change the input type of the data
- Tibbles can overload a console
- Tibbles can never create row names
- Tibbles won’t automatically change the names of variables
Tibbles are useful when working with large datasets because they make printing easier. But tibbles can never change the input type of the data, create row names, or change the names of variables.
Question 3
A data analyst is working with a large data frame. It contains so many columns that they don’t all fit on the screen at once. The analyst wants a quick list of all of the column names to get a better idea of what is in their data. What function should they use?
- head()
- str()
- colnames()
- mutate()
Thecolnames()
function will return a list of all the column names in a data frame for easy reference.
Question 4
A data analyst is working with the ToothGrowth dataset in R. What code chunk will allow them to get a quick summary of the dataset?
separate(ToothGrowth)
glimpse(ToothGrowth)
min(ToothGrowth)
colnames(ToothGrowth)
The code chunk isglimpse(ToothGrowth)
. Theglimpse()
function provides the analyst with a quick summary of the data in the ToothGrowth dataset. This function shows what all of the column names are and how many rows there are.
Question 5
A data analyst is working with the penguins dataset. What code chunk does the analyst write to make sure all the column names are unique and consistent and contain only letters, numbers, and underscores?
drop_na(penguins)
rename(penguins)
clean_names(penguins)
select(penguins)
The code chunk isclean_names(penguins)
. Theclean_names()
function ensures that there are only characters, numbers, and underscores in the names used in the data frame.
Question 6
A data analyst is working with the penguins data. They write the following code:
penguins %>%
The variable species includes three penguin species: Adelie, Chinstrap, and Gentoo. What code chunk does the analyst add to create a data frame that only includes the Gentoo species?
filter(species == "Gentoo")
filter(Gentoo == species)
filter(species <- "Gentoo")
filter(species == "Adelie")
The code chunk is filter(species == "Gentoo"). The filter function allows the data analyst to specify which part of the data they want to view. Two equal signs in an argument mean "exactly equal to." Using this operator instead of the assignment operator <- calls only the data about Gentoo penguins to the dataset.
Question 7
A data analyst is working with the penguins dataset. They write the following code:
penguins %>%
group_by(species) %>%
What code chunk does the analyst add to find the mean value for the variable body_mass_g?
summarize(mean(body_mass_g))
summarize(=body_mass_g)
summarize(max(body_mass_g))
summarize(body_mass_g(mean))
The code chunk issummarize(mean(body_mass_g))
. Thesummarize
function gives high-level information about a dataset.
Question 8
A data analyst is working with a data frame named salary_data. They want to create a new column named wages that includes data from the rate column multiplied by 40. What code chunk lets the analyst create the wages column?
mutate(salary_data, wages = rate * 40)
mutate(salary_data, rate = wages * 40)
mutate(wages = rate * 40)
mutate(salary_data, wages = rate + 40)
The code chunk ismutate(salary_data, wages = rate * 40)
. The analyst can use the mutate() function to create a new column called wages that includes data from the rate column multiplied by 40. The mutate() function can create a new column without affecting any existing columns.
Question 9
A data analyst is working with a data frame named customers. It has separate columns for area code (area_code) and phone number (phone_num). The analyst wants to combine the two columns into a single column called phone_number, with the area code and phone number separated by a hyphen. What code chunk lets the analyst create the phone_number column?
unite(customers, "phone_number", area_code, phone_num, sep="-")
unite(customers, area_code, phone_num, sep="-")
unite(customers, "phone_number", area_code, phone_num)
unite(customers, "phone_number", area_code, sep="-")
The code chunkunite(customers, "phone_number", area_code, phone_num, sep="-")
. lets the analyst create the phone_number column. Theunite()
function lets the analyst combine the area code and phone number data into a single column. In the parentheses of the function, the analyst writes the name of the data frame, then the name of the new column in quotation marks, followed by the names of the two columns they want to combine. Finally, the argumentsep="-"
places a hyphen between the area code and phone number data in the phone_number column.
Question 10
A data analyst wants to summarize their data with the sd(), cor(), and mean(). What kind of measures are these?
- Numerical
- Summary
- Statistical
- Standard
Standard deviation, correlation, mean, maximum, and minimum are statistical measures which can be used to summarize data.
Question 11
In R, which statistical measure demonstrates how strong the relationship is between two variables?
- Maximum
- Standard deviation
- Correlation
- Average
Correlation measures how strong the relationship between two variables is. This is represented by the cor() function.
Question 12
A data analyst is studying weather data. They write the following code chunk:
bias(actual_temp, predicted_temp)
What will this code chunk calculate?
The average difference between the actual and predicted values
The minimum difference between the actual and predicted values
The maximum difference between the actual and predicted values
The total average of the values
The bias() function can be used to calculate the average amount a predicted outcome and actual outcome differ in order to determine if the data model is biased.
- 一一一一一一一一一一一一一一一一一一一一一一一一一一一一一一一一一一一一一一一一一一一一一一
Comments
Post a Comment