A statistician notices gaps in data associated with age-related illnesses and wants to further aggregate these observations. Which of the following is the best technique to achieve this goal?
→ Binning (also known as discretization) involves grouping continuous variables into categories or bins. This technique is useful for aggregation, especially when analyzing trends across ranges (e.g., age groups: 0–18, 19–35, etc.).
In this case, aggregating observations by age ranges would help analyze age-related illnesses more clearly.
Why the other options are incorrect:
A: Label encoding is used to convert categorical values into numeric codes.
B: Linearization generally refers to transforming non-linear relationships into linear ones — not relevant here.
D: Imputing fills missing values, not aggregates or groups them.
Official References:
CompTIA DataX (DY0-001) Study Guide – Section 3.3:“Binning is used to group continuous data for summarization or pattern discovery. Often used in demographic analysis such as age ranges.”
Data Science for Business – Chapter 5:“Discretization simplifies complex continuous variables into interpretable categories, enhancing visualization and trend detection.”
Contribute your Thoughts:
Chosen Answer:
This is a voting comment (?). You can switch to a simple comment. It is better to Upvote an existing comment if you don't have anything to add.
Submit