Which one is not the types of Feature Engineering Transformation?
A. Scaling
B. Encoding
C. Aggregation
D. Normalization
Performance metrics are a part of every machine learning pipeline, Which ones are not the performance metrics used in the Machine learning?
A. R - (R-Squared)
B. Root Mean Squared Error (RMSE)
C. AU-ROC
D. AUM
Which Python method can be used to Remove duplicates by Data scientist?
A. remove_duplicates()
B. duplicates()
C. drop_duplicates()
D. clean_duplicates()
Consider a data frame df with 10 rows and index [ 'r1', 'r2', 'r3', 'row4', 'row5', 'row6', 'r7', 'r8', 'r9', 'row10']. What does the aggregate method shown in below code do?
g = df.groupby(df.index.str.len())
A. aggregate({'A':len, 'B':np.sum})
B. Computes Sum of column A values
C. Computes length of column A
D. Computes length of column A and Sum of Column B values of each group
E. Computes length of column A and Sum of Column B values
Which of the following is a useful tool for gaining insights into the relationship between features and predictions?
A. numpy plots
B. sklearn plots
C. Partial dependence plots(PDP)
D. FULL dependence plots (FDP)
Select the correct mappings:
I. W Weights or Coefficients of independent variables in the Linear regression model --> Model Pa-rameter
II. K in the K-Nearest Neighbour algorithm --> Model Hyperparameter
III. Learning rate for training a neural network --> Model Hyperparameter
IV.
Batch Size --> Model Parameter
A.
I,II
B.
I,II,III
C.
III,IV
D.
II,III,IV
Which of the Following is not type of Windows function in Snowflake? Choose 2.
A. Rank-related functions.
B. Window frame functions.
C. Aggregation window functions.
D. Association functions.
Which of the following cross validation versions is suitable quicker cross-validation for very large datasets with hundreds of thousands of samples?
A. k-fold cross-validation
B. Leave-one-out cross-validation
C. Holdout method
D. All of the above
Skewness of Normal distribution is ___________
A. Negative
B. Positive
C. 0
D. Undefined
How do you handle missing or corrupted data in a dataset?
A. Drop missing rows or columns
B. Replace missing values with mean/median/mode
C. Assign a unique category to missing values
D. All of the above