How do you choose the best linear regression model?
There are a number of methods and considerations for choosing the “best” regression method, regardless of the software and/or R package you are using. The guidelines below are relevant to R, Python, SPSS or whatever.
Some packages are easier than others to use (e.g. economists prefer Stata, social scientists tend to prefer SPSS, lots of people like open source products like R or Python) but they all do much the same thing, in terms of the math. Similarly, R has regression routines in the base package, but there are also lots of specialized packages that have been optimized to make life easier (assuming that the package will load).
Picking the “best” model is really up to the analyst, regardless of the package that is used.
· What are you predicting? If it is a continuous variable (within some range), then you want to use some form of multiple regression. If it is a probability of some event occurring (e.g. will a student graduate) then you want to use some form of logistic regression. There are also more specialized routines (e.g. hierarchical regression), used for particular purposes.
· Consider the theory behind your regression model, while choosing dependent variables. This might be derived from previous published research, subject matter experts, or just common sense.
· You may want to use indicator variables (“dummy variables”) as well as numeric variables. That would be for non-numeric variables such as gender.
· Different variables can be tested, based on theoretical considerations or exploratory data analysis. Examining scatter plots of the possible independent variables against the dependent can help. Also look at your variables, as to whether they fit the assumptions of regression analysis.
· The above analysis might lead you to performing data transformations that are needed to linearize some variables (e.g. you might have to test a quadratic) or to ensure that you aren’t violating any assumptions (e.g. a log transform if the data is skewed). Some R packages might do these transformations if you request them, other times you might have to do them in a data step.
· Once you have settled on a reasonable set of variables, test your model. You will usually be interested in the (adjusted) R-square of the model, and how that changes depending on which variables are in the model. The better the model, the higher the model R-square (i.e. the data fits the model better). You can also use an F-test to see if adding a variable is justified.
· There are a number of standard model building methods, such as forward selection, backward selection or stepwise selection. They all have their advantages and disadvantages (and may go in and out of fashion over time).
· You also have to look out for multi-collinearity, where two or more independent variables are themselves correlated. There are diagnostics for this (e.g. VIF) as well as indications that point to multi-collinearity (e.g. a variable has a counter-intuitive positive or negative sign).
· Consider interaction effects, where the effect of a variable on a regression depends on another variable. Keep in mind that models with a lot of interactions (especially 3-way or more) are difficult to interpret and explain to others.
· Examine outliers to see if there are some that are having an inordinate effect on the regression. There are diagnostics such as DFITS and DFBETAs for that.
· It can be interesting to compare traditional statistical modelling to newer data science techniques. You would expect them to come to similar conclusions, though the “black box” nature of many machine learning techniques can make the comparisons difficult to do.
· Try not to take criticism of your model personally. There are so many possibilities for a complex model, that there is bound to be disagreement.
· And always remember the phrase “All models are wrong, but some are useful” (usually attributed to G. Box).
------------------------------------------------------------------------------------------------------
A Drive Across Newfoundland
U.S.: https://www.amazon.com/dp/B07NMR9WM8
U.K.: https://www.amazon.co.uk/dp/B07NMR9WM8
Germany: https://www.amazon.de/dp/B07NMR9WM8
Japan: https://www.amazon.co.jp/dp/B07NMR9WM8
Canada: https://www.amazon.ca/dp/B07NMR9WM8
Australia: https://www.amazon.com.au/dp/B07NMR9WM8
India: https://www.amazon.in/dp/B07NMR9WM8
Newfoundland, Canada’s most easterly province, is a region that is both fascinating in its unique culture and amazing in its vistas of stark beauty. The weather is often wild, with coastal regions known for steep cliffs and crashing waves (though tranquil beaches exist too). The inland areas are primarily Precambrian shield, dominated by forests, rivers, rock formations, and abundant wildlife. The province also features some of the Earth’s most remarkable geology, notably The Tablelands, where the mantle rocks of the Earth’s interior have been exposed at the surface, permitting one to explore an almost alien landscape, an opportunity available on only a few scattered regions of the planet.
The city of St. John’s is one of Canada’s most unique urban
areas, with a population that maintains many old traditions and cultural
aspects of the British Isles. That’s true of the rest of the province, as well,
where the people are friendly and inclined to chat amiably with visitors. Plus,
they talk with amusing accents and party hard, so what’s not to like?
This account focusses on a two-week road trip in October 2007, from St. John’s
in the southeast, to L’Anse aux Meadows in the far northwest, the only known
Viking settlement in North America. It also features a day hike visit to The
Tablelands, a remarkable and majestic geological feature. Even those who don’t
normally consider themselves very interested in geology will find themselves
awe-struck by these other-worldly landscapes.
A Ride on the Kettle Valley Rail Trail: A Biking Journal Kindle Edition
by Dale Olausen(Author), Helena Puumala(Editor)
The Kettle Valley
Rail Trail is one of the longest and most scenic biking and hiking trails in
Canada. It covers a good stretch of the south-central interior of British
Columbia, about 600 kilometers of scenic countryside. British Columbia is one
of the most beautiful areas of Canada, which is itself a beautiful country,
ideal for those who appreciate natural splendour and achievable adventure in
the great outdoors.
The trail passes through a great variety of geographical and geological
regions, from mountains to valleys, along scenic lakes and rivers, to dry
near-desert condition grasslands. It often features towering canyons, spanned
by a combination of high trestle bridges and long tunnels, as it passes through
wild, unpopulated country. At other times, it remains quite low, in populated
valleys, alongside spectacular water features such as beautiful Lake Okanagan,
an area that is home to hundreds of vineyards, as well as other civilized
comforts.
The trail is a nice test of one’s physical fitness, as well as one’s wits and
adaptability, as much of it does travel through true wilderness. The views are
spectacular, the wildlife is plentiful and the people are friendly. What more
could one ask for?
What follows is a journal of two summers of adventure, biking most of the trail
in the late 1990s. It is about 33,000 words in length (2 to 3 hours reading),
and contains numerous photographs of the trail. There are also sections
containing a brief history of the trail, geology, flora and fauna, and
associated information.
After reading this account, you should have a good sense of whether the trail is right for you. If you do decide to ride the trail, it will be an experience you will never forget.
Amazon U.S.: https://www.amazon.com/dp/B01GBG8JE0
Amazon U.K.: https://www.amazon.co.uk/dp/B01GBG8JE0
Amazon Germany: https://www.amazon.de/dp/B01GBG8JE0
Amazon Canada: https://www.amazon.ca/dp/B01GBG8JE0
Amazon Australia: https://www.amazon.com.au/dp/B01GBG8JE0