Content is user-generated and unverified.

R Programming Exam Guide

1. Conditional Statements

Definition: Control structures that execute code based on logical conditions.

Types:

  • if: Single condition
  • if-else: Two-way branching
  • break: Exit loops
  • return: Exit functions
  • next: Skip to next iteration

Syntax & Examples:

r
# if statement
if (condition) {
  # code
}

# if-else
if (x > 0) {
  print("Positive")
} else {
  print("Non-positive")
}

# break in loop
for (i in 1:10) {
  if (i == 5) break
  print(i)
}

# return in function
my_func <- function(x) {
  if (x < 0) return("Negative")
  return("Non-negative")
}

# next in loop
for (i in 1:5) {
  if (i == 3) next
  print(i)
}

2. Matrix Operations & Algebra

Definition: Mathematical operations performed on matrices in R.

Key Operations:

  • Creation: matrix(), rbind(), cbind()
  • Arithmetic: +, -, *, /
  • Matrix multiplication: %*%
  • Transpose: t()
  • Inverse: solve()
  • Determinant: det()

Examples:

r
# Create matrices
A <- matrix(c(1,2,3,4), nrow=2, ncol=2)
B <- matrix(c(5,6,7,8), nrow=2, ncol=2)

# Operations
A + B          # Element-wise addition
A %*% B        # Matrix multiplication
t(A)           # Transpose
solve(A)       # Inverse
det(A)         # Determinant

3. Data Structures in R

Definition: Ways to organize and store data in R.

Types:

  1. Vector: One-dimensional array
  2. Matrix: Two-dimensional array
  3. Array: Multi-dimensional array
  4. List: Collection of different data types
  5. Data Frame: Table-like structure
  6. Factor: Categorical data

Examples:

r
# Vector
vec <- c(1, 2, 3, 4, 5)

# Matrix
mat <- matrix(1:6, nrow=2, ncol=3)

# Array
arr <- array(1:24, dim=c(2,3,4))

# List
lst <- list(numbers=1:5, names=c("A","B","C"))

# Data Frame
df <- data.frame(name=c("John","Jane"), age=c(25,30))

# Factor
fac <- factor(c("low","medium","high"))

4. R Program for Data Structures

r
# Comprehensive Data Structure Program
cat("=== R Data Structures Demo ===\n")

# 1. Vector
print("1. VECTOR:")
numeric_vec <- c(1, 2, 3, 4, 5)
character_vec <- c("apple", "banana", "orange")
logical_vec <- c(TRUE, FALSE, TRUE)
print(numeric_vec)
print(character_vec)
print(logical_vec)

# 2. Matrix
print("\n2. MATRIX:")
mat <- matrix(1:12, nrow=3, ncol=4)
print(mat)
print(paste("Dimensions:", dim(mat)[1], "x", dim(mat)[2]))

# 3. Array
print("\n3. ARRAY:")
arr <- array(1:24, dim=c(2,3,4))
print(arr)

# 4. List
print("\n4. LIST:")
my_list <- list(
  numbers = 1:5,
  names = c("Alice", "Bob", "Charlie"),
  matrix = matrix(1:6, nrow=2),
  logical = c(TRUE, FALSE)
)
print(my_list)

# 5. Data Frame
print("\n5. DATA FRAME:")
df <- data.frame(
  Name = c("John", "Jane", "Bob"),
  Age = c(25, 30, 35),
  Salary = c(50000, 60000, 70000),
  Married = c(TRUE, FALSE, TRUE)
)
print(df)

# 6. Factor
print("\n6. FACTOR:")
education <- factor(c("High School", "Bachelor", "Master", "PhD"))
print(education)
print(levels(education))

5. User-Defined Functions

Definition: Custom functions created by users to perform specific tasks.

Syntax:

r
function_name <- function(parameters) {
  # function body
  return(value)
}

Example:

r
# Simple function
calculate_area <- function(length, width) {
  area <- length * width
  return(area)
}

# Function with default parameters
greet <- function(name, greeting = "Hello") {
  message <- paste(greeting, name)
  return(message)
}

# Usage
result <- calculate_area(5, 3)
print(result)  # Output: 15

msg <- greet("Alice")
print(msg)     # Output: "Hello Alice"

6. Looping Statements

Definition: Control structures that repeat code execution.

Types:

  1. for loop: Iterate over sequences
  2. while loop: Continue until condition is false
  3. repeat loop: Infinite loop with break condition

Examples:

r
# for loop
for (i in 1:5) {
  print(i)
}

# while loop
count <- 1
while (count <= 5) {
  print(count)
  count <- count + 1
}

# repeat loop
x <- 1
repeat {
  print(x)
  x <- x + 1
  if (x > 5) break
}

7. Data Visualization Program

r
# Comprehensive Data Visualization Program
# Create sample data
set.seed(123)
data <- data.frame(
  x = rnorm(100, mean=50, sd=10),
  y = rnorm(100, mean=30, sd=5),
  category = factor(sample(c("A", "B", "C"), 100, replace=TRUE))
)

# Set up plotting area
par(mfrow=c(2,3))

# 1. Plot (Scatter plot)
plot(data$x, data$y, 
     main="Scatter Plot", 
     xlab="X values", 
     ylab="Y values",
     col="blue", 
     pch=16)

# 2. Histogram
hist(data$x, 
     main="Histogram of X", 
     xlab="X values", 
     ylab="Frequency",
     col="lightblue", 
     breaks=10)

# 3. Line Chart
x_seq <- 1:20
y_seq <- x_seq^2
plot(x_seq, y_seq, 
     type="l", 
     main="Line Chart", 
     xlab="X", 
     ylab="Y",
     col="red", 
     lwd=2)

# 4. Pie Chart
pie_data <- table(data$category)
pie(pie_data, 
    main="Pie Chart", 
    col=c("red", "green", "blue"))

# 5. Box Plot
boxplot(data$x ~ data$category, 
        main="Box Plot", 
        xlab="Category", 
        ylab="Values",
        col=c("red", "green", "blue"))

# 6. Scatter Plot with colors
plot(data$x, data$y, 
     col=as.numeric(data$category),
     main="Colored Scatter Plot",
     xlab="X values", 
     ylab="Y values",
     pch=16)
legend("topright", legend=levels(data$category), 
       col=1:3, pch=16)

8. Data Visualization Concept

Definition: Graphical representation of data to reveal patterns, trends, and insights.

Key Components:

  • Aesthetics: Visual properties (color, size, shape)
  • Geoms: Geometric objects (points, lines, bars)
  • Scales: Mapping between data and aesthetics
  • Facets: Subplots for different data subsets

Types:

  • Scatter plots: Show relationships
  • Bar charts: Compare categories
  • Histograms: Show distributions
  • Line charts: Show trends over time
  • Box plots: Show data distribution and outliers

Example:

r
# Simple visualization example
library(ggplot2)
data(mtcars)

# Create visualization
ggplot(mtcars, aes(x=wt, y=mpg)) +
  geom_point(aes(color=factor(cyl))) +
  geom_smooth(method="lm") +
  labs(title="Car Weight vs MPG", 
       x="Weight", y="Miles per Gallon")

9. Bernoulli & Binomial Distribution PMF

Bernoulli Distribution:

  • Definition: Models single trial with two outcomes (success/failure)
  • PMF: P(X=k) = p^k * (1-p)^(1-k), where k ∈ {0,1}
  • Parameters: p (probability of success)

Binomial Distribution:

  • Definition: Models number of successes in n independent Bernoulli trials
  • PMF: P(X=k) = C(n,k) * p^k * (1-p)^(n-k)
  • Parameters: n (trials), p (success probability)

R Functions:

r
# Bernoulli (special case of binomial with n=1)
dbinom(1, size=1, prob=0.3)  # P(X=1)

# Binomial
dbinom(3, size=10, prob=0.5)  # P(X=3) with n=10, p=0.5
pbinom(3, size=10, prob=0.5)  # P(X≤3)
rbinom(100, size=10, prob=0.5)  # Generate 100 random values

10. Uniform & Normal Distribution PDF

Uniform Distribution:

  • Definition: All values in interval [a,b] have equal probability
  • PDF: f(x) = 1/(b-a) for a ≤ x ≤ b, 0 otherwise
  • Parameters: a (minimum), b (maximum)

Normal Distribution:

  • Definition: Bell-shaped continuous distribution
  • PDF: f(x) = (1/σ√(2π)) * e^(-((x-μ)²)/(2σ²))
  • Parameters: μ (mean), σ (standard deviation)

R Functions:

r
# Uniform
dunif(0.5, min=0, max=1)      # PDF at x=0.5
punif(0.5, min=0, max=1)      # CDF at x=0.5
runif(100, min=0, max=1)      # Generate 100 random values

# Normal
dnorm(0, mean=0, sd=1)        # PDF at x=0
pnorm(0, mean=0, sd=1)        # CDF at x=0
rnorm(100, mean=0, sd=1)      # Generate 100 random values

11. Types of Hypothesis Testing

Definition: Statistical method to make inferences about population parameters.

Types:

  1. One-sample tests: Compare sample to known value
  2. Two-sample tests: Compare two samples
  3. Parametric tests: Assume specific distribution
  4. Non-parametric tests: Distribution-free
  5. One-tailed tests: Directional hypothesis
  6. Two-tailed tests: Non-directional hypothesis

Common Tests:

  • t-test: Compare means
  • z-test: Compare proportions
  • Chi-square test: Test independence
  • F-test: Compare variances
  • ANOVA: Compare multiple groups

12. Components of Hypothesis Testing

Key Components:

  1. Null Hypothesis (H₀): Statement of no effect/difference
  2. Alternative Hypothesis (H₁): Statement of effect/difference
  3. Test Statistic: Calculated value from sample data
  4. P-value: Probability of observing test statistic under H₀
  5. Significance Level (α): Threshold for rejecting H₀
  6. Critical Value: Boundary value for rejection region
  7. Decision Rule: Reject H₀ if p-value < α

Example:

r
# Testing if mean = 100
sample_data <- c(98, 102, 95, 103, 99, 101, 97, 104)
t.test(sample_data, mu=100, alternative="two.sided")

13. Testing Mean - T-tests

Definition: Statistical test to compare means when population standard deviation is unknown.

One-Sample T-test:

  • Purpose: Test if sample mean equals hypothesized value
  • Syntax: t.test(x, mu=value)

Two-Sample T-test:

  • Purpose: Compare means of two groups
  • Syntax: t.test(x, y) or t.test(x ~ group)

Examples:

r
# One-sample t-test
data1 <- c(12, 15, 13, 14, 16, 11, 17, 13, 15, 14)
t.test(data1, mu=13)

# Two-sample t-test
group1 <- c(20, 22, 19, 21, 23, 18, 24, 20, 22, 21)
group2 <- c(15, 17, 14, 16, 18, 13, 19, 15, 17, 16)
t.test(group1, group2)

# Paired t-test
before <- c(10, 12, 11, 13, 9, 14, 8, 15, 10, 12)
after <- c(8, 10, 9, 11, 7, 12, 6, 13, 8, 10)
t.test(before, after, paired=TRUE)

14. Testing Proportion - Z-tests

Definition: Statistical test to compare proportions using normal approximation.

One-Sample Z-test:

  • Purpose: Test if sample proportion equals hypothesized value
  • Syntax: prop.test(x, n, p=value)

Two-Sample Z-test:

  • Purpose: Compare proportions of two groups
  • Syntax: prop.test(c(x1, x2), c(n1, n2))

Examples:

r
# One-sample proportion test
# Test if proportion = 0.5 with 60 successes out of 100 trials
prop.test(60, 100, p=0.5)

# Two-sample proportion test
# Group 1: 45 successes out of 100
# Group 2: 30 successes out of 80
prop.test(c(45, 30), c(100, 80))

# Manual z-test calculation
z_test <- function(x, n, p0) {
  p_hat <- x/n
  se <- sqrt(p0 * (1-p0) / n)
  z <- (p_hat - p0) / se
  p_value <- 2 * (1 - pnorm(abs(z)))
  return(list(z_statistic=z, p_value=p_value))
}

15. Binary Variable Linear Regression

Definition: Regression with binary (0/1) dependent variable, typically using logistic regression.

Key Concepts:

  • Logistic Regression: Uses logit link function
  • Odds Ratio: Exponential of coefficients
  • Probability: P(Y=1) = e^(β₀+β₁X) / (1 + e^(β₀+β₁X))

Example:

r
# Create binary outcome data
set.seed(123)
x <- rnorm(100, mean=50, sd=10)
prob <- exp(-2 + 0.1*x) / (1 + exp(-2 + 0.1*x))
y <- rbinom(100, 1, prob)

# Fit logistic regression
model <- glm(y ~ x, family=binomial)
summary(model)

# Predictions
predictions <- predict(model, type="response")
binary_pred <- ifelse(predictions > 0.5, 1, 0)

# Model evaluation
table(y, binary_pred)

16. Multiple Variable Linear Regression

Definition: Regression with multiple independent variables predicting one dependent variable.

Equation: Y = β₀ + β₁X₁ + β₂X₂ + ... + βₚXₚ + ε

Key Concepts:

  • Multiple R-squared: Proportion of variance explained
  • Adjusted R-squared: Penalizes for number of variables
  • F-statistic: Overall model significance
  • Multicollinearity: High correlation between predictors

Example:

r
# Multiple regression example
data(mtcars)
model <- lm(mpg ~ wt + hp + cyl + disp, data=mtcars)
summary(model)

# Check assumptions
plot(model)

# Variance Inflation Factor (VIF)
library(car)
vif(model)

17. Linear Regression Functions

a) lm() - Linear Model:

  • Purpose: Fit linear regression models
  • Syntax: lm(formula, data)
  • Example: lm(y ~ x1 + x2, data=df)

b) coef() - Coefficients:

  • Purpose: Extract model coefficients
  • Syntax: coef(model)
  • Example: coef(lm(mpg ~ wt, data=mtcars))

c) summary() - Model Summary:

  • Purpose: Detailed model statistics
  • Syntax: summary(model)
  • Returns: Coefficients, R-squared, F-statistic, p-values

d) confint() - Confidence Intervals:

  • Purpose: Confidence intervals for coefficients
  • Syntax: confint(model, level=0.95)
  • Example: confint(model, level=0.99)

e) predict() - Predictions:

  • Purpose: Make predictions from model
  • Syntax: predict(model, newdata)
  • Example: predict(model, newdata=data.frame(x=c(1,2,3)))

Complete Example:

r
# Comprehensive linear regression example
data(mtcars)
model <- lm(mpg ~ wt + hp, data=mtcars)

# a) Model creation
print("Model:")
print(model)

# b) Coefficients
print("Coefficients:")
print(coef(model))

# c) Summary
print("Summary:")
print(summary(model))

# d) Confidence intervals
print("Confidence Intervals:")
print(confint(model, level=0.95))

# e) Predictions
new_data <- data.frame(wt=c(2.5, 3.0, 3.5), hp=c(100, 150, 200))
predictions <- predict(model, newdata=new_data)
print("Predictions:")
print(predictions)

18. Forward & Backward Selection

Definition: Automated variable selection methods for model building.

Forward Selection:

  • Process: Start with no variables, add variables based on significance
  • Criterion: Add variable with lowest p-value (if < α)
  • Stops: When no variables meet inclusion criteria

Backward Selection:

  • Process: Start with all variables, remove non-significant variables
  • Criterion: Remove variable with highest p-value (if > α)
  • Stops: When all remaining variables are significant

Example:

r
# Forward selection
library(MASS)
data(mtcars)

# Start with null model
null_model <- lm(mpg ~ 1, data=mtcars)
full_model <- lm(mpg ~ ., data=mtcars)

# Forward selection
forward_model <- stepAIC(null_model, 
                        scope=list(lower=null_model, upper=full_model),
                        direction="forward")

# Backward selection
backward_model <- stepAIC(full_model, direction="backward")

# Both directions
both_model <- stepAIC(null_model,
                     scope=list(lower=null_model, upper=full_model),
                     direction="both")

19. Poisson Distribution

Definition: Models count data - number of events in fixed interval.

Characteristics:

  • Parameter: λ (lambda) - mean and variance
  • PMF: P(X=k) = (e^(-λ) * λ^k) / k!
  • Mean: λ
  • Variance: λ

R Functions:

r
# Poisson distribution functions
lambda <- 3

# Probability mass function
dpois(2, lambda)          # P(X=2)

# Cumulative distribution function
ppois(2, lambda)          # P(X≤2)

# Quantile function
qpois(0.5, lambda)        # Median

# Random number generation
rpois(100, lambda)        # Generate 100 random values

# Example: Modeling number of calls per hour
calls <- rpois(24, lambda=5)  # 24 hours, average 5 calls/hour
hist(calls, main="Poisson Distribution - Calls per Hour")

20. Features of R Programming

Key Features:

  1. Open Source: Free and community-driven
  2. Statistical Computing: Built for data analysis
  3. Vectorized Operations: Efficient element-wise operations
  4. Data Structures: Comprehensive data types
  5. Graphics: Advanced plotting capabilities
  6. Packages: Extensive library ecosystem (CRAN)
  7. Cross-Platform: Works on Windows, Mac, Linux
  8. Memory Management: Automatic garbage collection
  9. Interactive Environment: Command-line interface
  10. Extensible: Can integrate with C, C++, Python

Advantages:

  • Rich statistical functions
  • Excellent data visualization
  • Large community support
  • Reproducible research
  • Integration with databases

21. Exception Handling

Definition: Mechanism to handle errors and unexpected conditions in code.

Functions:

  • try(): Execute code and catch errors
  • tryCatch(): More comprehensive error handling
  • stop(): Generate custom errors
  • warning(): Generate warnings

Examples:

r
# Simple try
result <- try(log(-1), silent=TRUE)
if (inherits(result, "try-error")) {
  print("Error occurred")
}

# tryCatch with multiple handlers
tryCatch({
  x <- 10/0
  print("This won't print")
}, error = function(e) {
  print(paste("Error:", e$message))
}, warning = function(w) {
  print(paste("Warning:", w$message))
}, finally = {
  print("This always executes")
})

# Custom error function
safe_divide <- function(x, y) {
  if (y == 0) {
    stop("Division by zero not allowed")
  }
  return(x/y)
}

22. Vector Functions

Definition: Functions that operate on vectors in R.

Key Functions:

  • Creation: c(), rep(), seq(), :
  • Information: length(), class(), typeof()
  • Subsetting: [], head(), tail()
  • Manipulation: append(), sort(), rev()
  • Mathematical: sum(), mean(), max(), min()
  • Logical: which(), any(), all()

Examples:

r
# Vector creation
vec1 <- c(1, 2, 3, 4, 5)
vec2 <- rep(2, 5)
vec3 <- seq(1, 10, by=2)
vec4 <- 1:10

# Vector information
length(vec1)
class(vec1)
typeof(vec1)

# Subsetting
vec1[c(1, 3, 5)]
head(vec1, 3)
tail(vec1, 3)

# Manipulation
append(vec1, 6)
sort(vec1, decreasing=TRUE)
rev(vec1)

# Mathematical operations
sum(vec1)
mean(vec1)
max(vec1)
min(vec1)

# Logical operations
which(vec1 > 3)
any(vec1 > 3)
all(vec1 > 0)

23. Logical Operators

Definition: Operators that work with logical values (TRUE/FALSE).

Types:

  • &: Element-wise AND
  • |: Element-wise OR
  • !: NOT
  • &&: Logical AND (first element only)
  • ||: Logical OR (first element only)
  • ==: Equal to
  • !=: Not equal to
  • <, >, <=, >=: Comparison operators
  • %in%: Element matching

Examples:

r
# Create logical vectors
x <- c(TRUE, FALSE, TRUE, FALSE)
y <- c(TRUE, TRUE, FALSE, FALSE)

# Logical operations
x & y          # Element-wise AND
x | y          # Element-wise OR
!x             # NOT
x && y         # Logical AND (first element)
x || y         # Logical OR (first element)

# Comparison operations
a <- c(1, 2, 3, 4, 5)
b <- c(1, 3, 3, 2, 5)

a == b         # Equal to
a != b         # Not equal to
a > b          # Greater than
a <= b         # Less than or equal to

# Element matching
a %in% c(1, 3, 5)

# Practical example
data <- c(10, 25, 30, 15, 40, 35)
filtered <- data[data > 20 & data < 35]
print(filtered)  # Output: 25 30

Good luck with your R programming exam! Review these concepts and practice the code examples.

Content is user-generated and unverified.
    R Programming Exam Guide | Claude