NACA 0012 Airfoil Noise Prediction based on Wind Tunnel Testing

The following measurements have been made from the wind tunnel:

  1. Frequency, in Hertzs.
  2. Angle of attack, in degrees.
  3. Chord length, in meters.
  4. Free-stream velocity, in meters per second.
  5. Suction side displacement thickness, in meters.

The only output is:

  1. Scaled sound pressure level, in decibels.

The entire dataset is available from the UCI Machine Learning Repository. The following are some relevant papers on the dataset:

T.F. Brooks, D.S. Pope, and A.M. Marcolini. Airfoil self-noise and prediction. Technical report, NASA RP-1218, July 1989.

K. Lau. A neural networks approach for aerofoil noise prediction. Masters thesis, Department of Aeronautics. Imperial College of Science, Technology and Medicine (London, United Kingdom), 2006.

R. Lopez. Neural Networks for Variational Problems in Engineering. PhD Thesis, Technical University of Catalonia, 2008.

Motivation for current work

Noise is fast becoming a major constraint to the modern wind turbines. As rotors get larger, its inadvertible that the tip speeds gets quite high resulting in noisy operation. This poses a severe constraint on the location of the wind farm, potentially rendering an otherwise favourable site unfavourable.

Modern turbines breaching the 100dB threshold (almost as loud as a snow mobile or a chain saw), noise is a crucial factor in the design of modern airfoils for wind turbines.

The current industry practice is to use simple semi-emperical aeroacoustic prediction tools dedicated to trailing-edge noise and integrate results in the design cycle. Full scale Navier Stokes modelling of airfoils and blades are possible, but are highly resource intensive and sensitive to initial & boundary conditions. Additionally the fine details of the trailing edge boundary layer behaviour are quite hard capture, requiring very fine mesh.

Wind tunnel tests provide an alternate, reliable albeit expensive way to experimentally measure the noise of the airfoils under operating conditions.

Current Methodology

The objective of this analysis is to find a parsimonious model that satisfactorily predicts the Noise (in dB) profuced by the NACA 0012 airfoil under a set of operating conditions and frequencies.

The process would involve a combination of exploring the possible interactions between variables via exploratory graphs, splitting the data into training and test, determining and eliminating “low influence” variables, etc.

Lattice and CARET package would be used for extensively for these operations. Regression, Random Forest, Boosting and Bagging models will be developed and their predictions are compared.

Set-up

library(caret)
## Loading required package: lattice
## Loading required package: ggplot2
library(lattice)
library(manipulate)
library(xtable)
dat <- read.table(
        "C:/Users/S.Srisai/Documents/working_directory/R/datasets/airfoil_noise/airfoil_self_noise.dat")

names(dat) <- c("frequency","aoa","chord","vinf","delta","noise")

Identifying Correlated Variables

The next step is identifying any correlated variables.

descrCor <-  cor(dat)
summary(descrCor[upper.tri(descrCor)])
Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 

-0.50490 -0.25450 -0.15610 -0.08381 0.03127 0.75340

# Check for perfect correlation
highCorr <- sum(abs(descrCor[upper.tri(descrCor)]) > .999)
sum(highCorr)

[1] 0

# find highly correlated variables- correlation cut-off set at 0.75
highlyCorDescr <- findCorrelation(descrCor, cutoff = .75)
sum(highlyCorDescr)

[1] 2

# Ascertaining the correlations

xyplot(delta~aoa,dat,grid=T,type = c("p", "smooth"),, col.line = "darkorange",lwd = 2)

# dat<- dat[,-highlyCorDescr]
# aoa was removed,instead of delta. So manually eliminating delta instead of aoa.
dat <- dat[,-5]

Reynolds Number

Two of the measured input variables are Chord and Free Stream Velocity. Both can be summarized and replaced by Reynolds Number as its defined as:

Re = Free Stream Velocity * Characteristic Length/Kinematic Viscosity

Kinematic Viscosity for air at 300K (27°C) is noted as 1.568e-5 from Engineering Toolbox

Re will replace Vinf and C

Mutating Re and removing C and Vinf

Re = V_inf * C / nu

nu <- 1.568e-5 
dat$re <- dat$chord * dat$vinf / nu
dat <- dat[,-c(3,4)]
dat <- dat[,c(1:2,4,3)]
names(dat)

[1] “frequency” “aoa” “re” “noise”

Exploratory Plots

Exploratory Plots: Observation

The exploratory plots reveal the following pattern: - The general trend is for all algles of attack, the noise increases with frequency, upto a frequency of around 1000-5000Hz, after which, the noise reduces with increasing frequency.

Model Set-up: ZNV

Before getting started with any modelling, its important to eliminate any predictors that take an unique value across samples or exhibit little variance. Such predictors are not only non-informative, it can break some models you may want to fit to your data. One solution is to remove all predictors that satisfy a threshold criterion related to their variance.

By doing so, an explicit assumption is made that all those predictors are non-informative, which may not necessarily be true, especially for the near-zero variance ones. Those near-variance predictors can in fact turn out to be very informative. For example, assume that a binary predictor in a classification problem has lots of zeroes and few ones (near-variance predictor).

So caution is recommended while discarding such ZNV features.

NZV <- nearZeroVar(dat, saveMetrics= TRUE)
print(xtable(NZV), type="HTML")
freqRatio percentUnique zeroVar nzv
frequency 1.01 1.40 FALSE FALSE
aoa 3.54 1.80 FALSE FALSE
re 1.02 1.60 FALSE FALSE
noise 1.00 96.87 FALSE FALSE
# remove any NZVs
data <- dat[,!NZV$nzv]

Reproducibly setting up the training and test sets

set.seed(100)
inTrain <- createDataPartition(y=data$noise, p=0.70, list=FALSE)
training <- data[inTrain,]
testing <- data[-inTrain,]

Support function

A suport function was implimented to handle the repeated task of getting plots and MSE of the predictions. It takes the model as argument and returns the In-sample and Out-of-sample errors of the model

eval_model <- function(model) {
        
        pred_train <- predict(model,newdata = training)
        pred_test <- predict(model,newdata = testing)
        
        # Scatter plots of predictions on Training and Testing sets
        plot(pred_train,training$noise,xlim=c(100,150),ylim=c(100,150),col=1,
             pch=19,xlab = "Predicted Noise (dB)",ylab = "Actual Noise(dB)")
        points(pred_test,testing$noise,col=2,pch=19) 
        leg <- c("Training","Testing")
        legend(100, 150, leg, col = c(1, 2),pch=c(19,19))
        
        # Scatter plots of % error on predictions on Training and Testing sets
        par(mfrow = c(2, 1))
        par(cex = 0.6)
        par(mar = c(5, 5, 3, 0), oma = c(2, 2, 2, 2))
        plot((pred_train - training$noise)* 100 /training$noise,
             ylab = "% Error of Prediction", xlab = "Index",
             ylim = c(-5,5),col=1,pch=19)
        legend(0, 4.5, "Training", col = 1,pch=19)
        plot((pred_test-testing$noise)* 100 /testing$noise,
             ylab = "% Error of Prediction",  xlab = "Index",
             ylim = c(-5,5),col=2,pch=19)
        legend(0, 4.5, "Testing", col = 2,pch=19)
        
        # Actual data Vs Predictions superimposed for Training and Testing Data
        plot(1:length(training$noise),training$noise,pch=21,col=1,
             main = "Training: Actual Noise Vs Predicted Noise",
             xlab = "Index",ylab = "Noise (dB)")
        points(1:length(training$noise),pred_train,pch=21,col=2)
        #leg <- c("Training","Predicted Training")
        legend(0, 140, c("Actual","Predicted"), col = c(1, 2),pch=c(21,21))
        plot(1:length(testing$noise),testing$noise,pch=21,col=1,
             main = "Testing: Actual Noise Vs Predicted Noise",
             xlab = "Index",ylab = "Noise (dB)")
        points(1:length(testing$noise),pred_test,pch=21,col="red")
        legend(0, 140, c("Actual","Predicted"), col = c(1, 2),pch=c(21,21))
        
        ## Line graph of errors
        plot(pred_train-training$noise,type='l',ylim=c(-5,+5),
             xlab = "Index",ylab = "Actual - Predicted",main="Training")        
        plot(pred_test-testing$noise,type='l',ylim=c(-5,+5),
             xlab = "Index",ylab = "Actual - Predicted",main="Testing")
                
        ISRMSE<- sqrt(mean((pred_train-training$noise)^2))
        OSRMSE<- sqrt(mean((pred_test-testing$noise)^2))
        
        return(c( ISRMSE,OSRMSE))
}

Regression- Caret

ans_reg <- train(noise ~., data=training,method="lm")
#summary(ans_reg)
print(xtable(summary(ans_reg)), type="HTML")
Estimate Std. Error t value Pr(>|t|)
(Intercept) 134.2848 0.5008 268.16 0.0000
frequency -0.0011 0.0001 -18.36 0.0000
aoa -0.5103 0.0343 -14.88 0.0000
re -0.0000 0.0000 -11.69 0.0000
reg_r_adjusted <- summary(ans_reg)[9]
reg_r_adjusted

$adj.r.squared [1] 0.3170736

reg <- eval_model(ans_reg)

reg

[1] 5.718968 5.812436

Trees- Caret

cvCtrl <- trainControl(method = "repeatedcv", repeats = 10)
ans_tree <- train(noise ~ ., data = training, method = "rpart",
                        tuneLength = 100, trControl = cvCtrl)
tree <- eval_model(ans_tree)

tree

[1] 3.342470 4.078996

Boosting- caret

fitControl <- trainControl(method = "cv", number = 5, verboseIter=F)
gbmGrid <- expand.grid(interaction.depth = c(20,30,45), n.trees = 500, 
                       shrinkage = .1, n.minobsinnode = 10)
ans_boost <- train(noise ~ ., data = training,method = "gbm",
                 trControl = fitControl, verbose = FALSE, tuneGrid = gbmGrid)
plot(ans_boost)

boost <- eval_model(ans_boost)

boost

[1] 0.8433122 1.7874861

Bagging- Caret

ans_bag <- bag(training[,-7],training$noise, B = 10,
               bagControl = bagControl(fit = ctreeBag$fit,
                                       predict = ctreeBag$pred,
                                       aggregate = ctreeBag$aggregate))
bag <- eval_model(ans_bag)

bag

[1] 0.1815698 0.3206534

RF- Caret

ans_rf1<- train(x=training[,-7],y=training$noise,method="rf",
                trControl=trainControl(method = "cv", number = 4),
                data=training,do.trace=F,ntree=250)
rf <- eval_model(ans_rf1)

rf

[1] 0.04534854 0.10859618

Results and Summary

answer <- data.frame(c("reg"=reg,"tree"=tree,"boost"=boost,"bag"=bag,"rf"=rf))
names(answer) <- "answers"

plot(c(reg[1],0),c(0,reg[2]),type='l',col=1,lwd=3,
     xlab="In-Sample-Root-Mean-Squared-Error",
     ylab="Out-of-Sample-Root-Mean-Squared-Error",
     main="Model Root-Mean-Squared-Errors Comparison",
     xlim=c(0.2,5.8),ylim=c(0.2,5.8))
lines(c(tree[1],0),c(0,tree[2]),type='l',col=2,lwd=3)
lines(c(boost[1],0),c(0,boost[2]),type='l',col=3,lwd=3)
lines(c(bag[1],0),c(0,bag[2]),type='l',col=4,lwd=3)
lines(c(rf[1],0),c(0,rf[2]),type='l',col=5,lwd=3)
legend(3.93,6.02, c("Regression","Tree","Boosting","Bagging","Random Forest"),
       col = c(1,2,3,4,5),lwd=c(3,3,3,3,3))

Random Forest has clearly outperformed the rest of the prediction algorithms,both in In-sample Error and Out-of-sample error.