Pre-Summer Special Limited Time 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: validbest

Pass the Google Cloud Certified Professional-Data-Engineer Questions and answers with ValidTests

Exam Professional-Data-Engineer All Questions
Exam Professional-Data-Engineer Premium Access

View all detail and faqs for the Professional-Data-Engineer exam

Viewing page 11 out of 12 pages
Viewing questions 101-110 out of questions
Questions # 101:

You have one BigQuery dataset which includes customers' street addresses. You want to retrieve all occurrences of street addresses from the dataset. What should you do?

Options:

A.

Create a deep inspection job on each table in your dataset with Cloud Data Loss Prevention and create an inspection template that includes the STREET_ADDRESS infoType.

B.

Create a de-identification job in Cloud Data Loss Prevention and use the masking transformation.

C.

Write a SQL query in BigQuery by using REGEXP_CONTAINS on all tables in your dataset to find rows where the word "street" appears.

D.

Create a discovery scan configuration on your organization with Cloud Data Loss Prevention and create an inspection template thatincludes the STREET_ADDRESS infoType.

Expert Solution
Questions # 102:

An aerospace company uses a proprietary data format to store its night data. You need to connect this new data source to BigQuery and stream the data into BigQuery. You want to efficiency import the data into BigQuery where consuming as few resources as possible. What should you do?

Options:

A.

Use a standard Dataflow pipeline to store the raw data in BigQuery and then transform the format later when the data is used.

B.

Write a shell script that triggers a Cloud Function that performs periodic ETL batch jobs on the new data source

C.

Use Apache Hive to write a Dataproc job that streams the data into BigQuery in CSV format

D.

Use an Apache Beam custom connector to write a Dataflow pipeline that streams the data into BigQuery in Avro format

Expert Solution
Questions # 103:

You are migrating a large number of files from a public HTTPS endpoint to Cloud Storage. The files are protected from unauthorized access using signed URLs. You created a TSV file that contains the list of object URLs and started a transfer job by using Storage Transfer Service. You notice that the job has run for a long time and eventually failed Checking the logs of the transfer job reveals that the job was running fine until one point, and then it failed due to HTTP 403 errors on the remaining files You verified that there were no changes to the source system You need to fix the problem to resume the migration process. What should you do?

Options:

A.

Set up Cloud Storage FUSE, and mount the Cloud Storage bucket on a Compute Engine Instance Remove the completed files from the TSV file Use a shell script to iterate through the TSV file and download the remaining URLs to the FUSE mount point.

B.

Update the file checksums in the TSV file from using MD5 to SHA256. Remove the completed files from the TSV file and rerun the Storage Transfer Service job.

C.

Renew the TLS certificate of the HTTPS endpoint Remove the completed files from the TSV file and rerun the Storage Transfer Service job.

D.

Create a new TSV file for the remaining files by generating signed URLs with a longer validity period. Split the TSV file into multiple smaller files and submit them as separate Storage Transfer Service jobs in parallel.

Expert Solution
Questions # 104:

Your startup has a web application that currently serves customers out of a single region in Asia. You are targeting funding that will allow your startup lo serve customers globally. Your current goal is to optimize for cost, and your post-funding goat is to optimize for global presence and performance. You must use a native JDBC driver. What should you do?

Options:

A.

Use Cloud Spanner to configure a single region instance initially. and then configure multi-region C oud Spanner instances after securing funding.

B.

Use a Cloud SQL for PostgreSQL highly available instance first, and 8»gtable with US. Europe, and Asiareplication alter securing funding

C.

Use a Cloud SQL for PostgreSQL zonal instance first and Bigtable with US. Europe, and Asia after securing funding.

D.

Use a Cloud SOL for PostgreSQL zonal instance first, and Cloud SOL for PostgreSQL with highly available configuration after securing funding.

Expert Solution
Questions # 105:

You work for a bank. You have a labelled dataset that contains information on already granted loan application and whether these applications have been defaulted. You have been asked to train a model to predict default rates for credit applicants.

What should you do?

Options:

A.

Increase the size of the dataset by collecting additional data.

B.

Train a linear regression to predict a credit default risk score.

C.

Remove the bias from the data and collect applications that have been declined loans.

D.

Match loan applicants with their social profiles to enable feature engineering.

Expert Solution
Questions # 106:

You currently use a SQL-based tool to visualize your data stored in BigQuery The data visualizations require the use of outer joins and analytic functions. Visualizations must be based on data that is no less than 4 hours old. Business users are complaining that the visualizations are too slow to generate. You want to improve the performance of the visualization queries while minimizing the maintenance overhead of the data preparation pipeline. What should you do?

Options:

A.

Create materialized views with the allow_non_incremental_definition option set to true for the visualization queries. Specify the max_3taleness parameter to 4 hours and the enable_refresh parameter to true. Reference the materialized views in the data visualization tool.

B.

Create views for the visualization queries. Reference the views in the data visualization tool.

C.

Create materialized views for the visualization queries. Use the incremental updates capability of BigQuery materialized views to handlechanged data automatically. Reference the materialized views in the data visualization tool.

D.

Create a Cloud Function instance to export the visualization query results as parquet files to a Cloud Storage bucket. Use Cloud Schedulerto trigger the Cloud Function every 4 hours. Reference the parquet files in the data visualization tool.

Expert Solution
Questions # 107:

Your company is currently setting up data pipelines for their campaign. For all the Google Cloud Pub/Sub

streaming data, one of the important business requirements is to be able to periodically identify the inputs and their timings during their campaign. Engineers have decided to use windowing and transformation in Google Cloud Dataflow for this purpose. However, when testing this feature, they find that the Cloud Dataflow job fails for the all streaming insert. What is the most likely cause of this problem?

Options:

A.

They have not assigned the timestamp, which causes the job to fail

B.

They have not set the triggers to accommodate the data coming in late, which causes the job to fail

C.

They have not applied a global windowing function, which causes the job to fail when the pipeline iscreated

D.

They have not applied a non-global windowing function, which causes the job to fail when the pipeline is created

Expert Solution
Questions # 108:

Your neural network model is taking days to train. You want to increase the training speed. What can you do?

Options:

A.

Subsample your test dataset.

B.

Subsample your training dataset.

C.

Increase the number of input features to your model.

D.

Increase the number of layers in your neural network.

Expert Solution
Questions # 109:

When you design a Google Cloud Bigtable schema it is recommended that you _________.

Options:

A.

Avoid schema designs that are based on NoSQL concepts

B.

Create schema designs that are based on a relational database design

C.

Avoid schema designs that require atomicity across rows

D.

Create schema designs that require atomicity across rows

Expert Solution
Questions # 110:

You are developing a software application using Google's Dataflow SDK, and want to use conditional, for loops and other complex programming structures to create a branching pipeline. Which component will be used for the data processing operation?

Options:

A.

PCollection

B.

Transform

C.

Pipeline

D.

Sink API

Expert Solution
Viewing page 11 out of 12 pages
Viewing questions 101-110 out of questions