Quantitative Methods in Tourism: A Handbook (Aspects of Tourism): 79 - Softcover

Book 25 of 71: Aspects of Tourism

Rodolfo Baggio; Jane Klobas

 
9781845416188: Quantitative Methods in Tourism: A Handbook (Aspects of Tourism): 79

Synopsis

In this revised second edition, Baggio and Klobas build upon the work of their previous volume, offering a presentation of quantitative research methods for tourism researchers. This accessible and rigorous guide goes beyond the approaches usually covered in introductory textbooks on quantitative methods to consider useful techniques for statistical inquiry into tourism matters of all but the most econometrically complex kind. The first part of the book concerns common issues in statistical analysis of data and the most widely-used techniques, while the second part describes and discusses several newer and less common approaches to data analysis that are valuable for tourism researchers and analysts. second edition include: * a new chapter on 'Big Data' * consideration of data screening and cleaning * the use of similarity and diversity indexes for comparing samples * observations about the partial least squares (PLS) approach to path modelling * a new section on multi-group structural equation modelling * a new section on common method variance and its treatment * revised and updated section on software * fully updated references and examples

"synopsis" may belong to another edition of this title.

About the Author

Rodolfo Baggio is Professor of the Master of Economics and Tourism at Bocconi University, Italy. His research interests focus on information technology and tourism and his current work combines complexity theory and network analysis methods to the study of tourism destinations. Jane Klobas is an Education and Research Consultant, based in Australia and Italy. She is an Adjunct Professor at Murdoch University, Australia and a Visiting Professor at the University of Bergamo, Italy and other universities in Europe and Asia. Her interests include research development, adult learning, knowledge and information management and applications of the theory of planned behaviour.

Excerpt. © Reprinted by permission. All rights reserved.

Quantitative Methods in Tourism

A Handbook

By Rodolfo Baggio, Jane Klobas

Multilingual Matters

Copyright © 2017 Rodolfo Baggio and Jane Klobas
All rights reserved.
ISBN: 978-1-84541-618-8

Contents

Contributors, xi,
Foreword, xiii,
Introduction to the Second Edition, xv,
Introduction, xvii,
Part 1: The Analysis of Data, 1,
1 The Nature of Data in Tourism, 3,
2 Testing Hypotheses and Comparing Samples, 19,
3 Data Reduction, 47,
4 Model Building, 87,
5 Time-Dependent Phenomena and Forecasting, 152,
Part 2: Numerical Methods, 183,
6 Maximum Likelihood Estimation, 187,
7 Monte Carlo Methods, 200,
8 Big Data, 210,
9 Simulations and Agent-Based Modelling Jacopo A. Baggio, 223,
Appendix: Software Programs, 245,
Subject Index, 251,


CHAPTER 1

The Nature of Data in Tourism


This chapter contains a brief review of the nature of data as used in tourism and hospitality, and discusses the main quality characteristics needed to obtain useful and reliable outcomes from data analysis. A list of the main sources of tourism data is provided.

The protagonist in the adventures described in this book is the datum, better known in its plural form, data. The original Latin meaning, something given (and accepted as true), defines it well. It is (usually) a number, the result of some observation or measurement process, objectively representing concepts or other entities, put in a form suitable for communication, interpretation or processing by humans or automated systems. By themselves, and out of a specified context, data have no meaning at all; they are merely strings of symbols. Once organised or processed in some way, and associated with some other concepts or entities, they become useful information, assuming relevance and purpose, providing insights into phenomena, allowing judgements to be made and decisions to be taken (if interested in a discussion of these concepts, the review by Zins [2007] is a good starting point). All statistical techniques have exactly this objective.

Many disciplines, and tourism is no exception, require large quantities of data. The main challenge a researcher has today is that of managing a huge quantity, variety and complexity of data types, and of being sure to obtain useful and valid outcomes.


Data: A Taxonomy

It is possible to categorise data in several ways. One distinction is between primary and secondary data. Another classifies data by their level of measurement or measurement scale. Yet another is the medium or form from which the data are derived. We provide a brief overview of the key issues associated with data of each type here.

The distinction between primary and secondary data is made on the basis of the source of the data and their specificity to the study for which they are gathered. Each type of source has strengths and weaknesses, the focus of our discussion here.


Primary data

Primary data are those directly collected from the original or 'primary' source by researchers through methods such as direct observation (both human observation and automatic collection of data such as clicks on links in websites or through use of other information and communications technology), questionnaire surveys (online, printed or administered by telephone or computer), structured or unstructured interviews and case studies. To be classified as primary data, the data elements collected using any one of these techniques will be unique and tailored to the specific purposes of the study conducted. The most used techniques and their strengths and limitations are well described in many books (Babbie, 2010; Creswell, 2003; Hair et al., 2005; Neuman, 2006; Phillimore & Goodson, 2004; Veal, 2006; Yin, 1994). Here, we concentrate on recent developments and issues of particular relevance to tourism research.

The main disadvantages are well known: cost and time. Collecting tailored information tends to be expensive in terms of resources needed (money and people) and it may take a long time to properly design the research and process the results. Recently, use of the internet and the world wide web has reduced the cost and time requirements for conducting surveys. However, unless used carefully, the use of online surveys can hide problems related to the representativeness of the sample and the technical characteristics of the medium used and individual differences among respondents can bias results. Of course, these concerns are not unique to electronic media, but can be exacerbated by the seductive ease and speed of online data collection. Indeed, many survey experts consider internet surveying (provided the sample is representative) to provide valid, reliable and relatively error-free results, among other reasons because data are captured directly from the respondent without the need for an interviewer or assistant to enter the data separately into a database for analysis (Dillman, 2007).

Regardless of the method used to capture primary data, the researcher should consider and understand well all issues associated with sampling (representativeness and sample size) and obtaining data of suitable quality. From a practical point of view, it is advisable to start any study by surveying a pilot sample and studying the responses obtained. Participants in the pilot study can be asked to identify any questions that they found difficult to understand or to answer and, using a technique known as cognitive interviewing, they can also be asked how they interpreted specific questions. The data collected from a pilot study can be used to estimate population parameters for the statistical models that will be used to draw conclusions from the final survey, information that can be used to determine the data distribution and sample size necessary or desirable for the larger-scale investigation to be conducted effectively (Dillman, 2007; Pan, 2010).


Secondary data

In many cases, collecting primary data is not within the reach of the investigator. Furthermore, it is not always necessary to have primary data to conduct a study. For example, very few researchers would start collecting primary data on the number of tourists visiting a country or on the gross domestic product (GDP) of some nations. When theoretical or practical reasons do not indicate direct collection of data, secondary data are used. Secondary data are data gathered, typically by someone else, for a purpose other than the study for which they will be used. The main sources of secondary data external to an organisation are government agencies (statistical bureaus, public tourism departments), international associations and institutions, private research companies and industry associations. Data from these sources are available directly from the provider (particularly in the case of those public institutions that have an obligation – often by law – to make public the outcomes of their activities) or from libraries and electronic databases. Often, they can be obtained from these sources over the internet. Useful data for some studies can also be found in previously published research or reports. Increasingly, secondary data are drawn from the databases (typically customer or visitor databases) maintained by individual organisations. A special case of secondary data is so-called Big Data, which we discuss in Chapter 8.

Secondary data tend to be readily available and they are often free or inexpensive to obtain. It is often possible to assemble large quantities of data and to draw together data from different sources. On the other hand, secondary data may be more difficult to use and to interpret because, typically, they were gathered by other researchers, or by practitioners, for other purposes. Extracting useful information from a source of secondary data requires an understanding of the structure of the data and the database as well as a good understanding of the characteristics and meaning of each data element. A careful reading of the data specifications is essential in order to judge the suitability of the data for the study under way as well as their reliability and trustworthiness.

When secondary data are drawn from databases in which individuals can be identified (examples include corporate customer databases and data extracted from online social networks), researchers need also to meet criteria for the ethical treatment of data. The most widely accepted criteria are outlined in the Declaration of Helsinki (http://www.wma.net/en/30publications/10policies/b3/), which is maintained by the authors, the World Medical Association, and adopted for research in most fields that use data obtained from humans.

As a final point, secondary data are often preprocessed to give summaries, totals or averages (e.g. by country or region) and the original details cannot be easily recovered. The dangers of drawing conclusions about individuals from such preprocessed data are nicely described in Simpson's paradox: relationships observed at the aggregate level are not necessarily the same (or even in the same direction) as relationships observed at the level from which the data were aggregated. Figure 1.1 shows the relationship between two variables, let's say hours of sunlight (on the x axis) and visitor numbers (in thousands, on the y axis). The two lines show the relationship between hours of sunlight and daily visitor arrivals in four cities in two countries over Christmas (say, Australia and Denmark). Both lines show a positive relationship: the more sunlight in the city at Christmas time, the more visitor arrivals recorded. The dotted line shows, however, the relationship between the average number of hours of sunlight and the average visitor numbers to each city. On average, the cities in one country have around 10 hours of sunlight and around 3000 visitors while the cities in the second country have around 2 hours of sunlight and 7000 visitors. The dotted line shows a negative relationship between hours of sunlight and visitor arrivals. Which relationship is the right one? On which would one make plans?


Combining primary and secondary data

In many cases, the data used for a study come from different sources and a combination of primary and secondary data is quite common in tourism studies. In addition to the specific considerations of primary and secondary data, the researcher needs to keep in mind the nature of the sample and the level of aggregation of data from the different sources. Specific techniques may be needed to ensure that results are useful and to avoid errors.


Data Harmonisation, Standards and Collaboration

Use of secondary data, particularly when they are obtained from multiple sources, can be greatly aided by harmonisation and standards. International organisations harmonise data they draw from different countries as best they can, and record any country-specific variations from the standard data definition (such as year of data collection or age groups from which data are collected) in data specifications or metadata, but there is no universally recognised or adopted standard for many of the concepts that are important for tourism studies. Many attempts exist and international institutions have published several recommendations (see for example Eurostat, 2000, 2002; UNSD, 2010; UNWTO, 2000), but in many cases, local variations make complete harmonisation very difficult if not impossible. A good example is the classification of hospitality structures. Almost all countries (and often even regions in the same country) have developed their own schemes and a comparison between hotels in different areas of the world can become a difficult task (see for example Cser & Ohuchi, 2008; Hotelstars, 2010; IHRA, 2004).

Moreover, when it comes to electronically distributed data, scarce adoption of even the existing technological standards makes data collection and comparison even more difficult. A key issue is that different software applications and heterogeneous computing platforms need a way to exchange data automatically without much human intervention. This interoperability between systems (or, better, the lack of) is a problem which is most obvious in large online commercial environments, but also has significant effects on the possibility to extract and use data for research purposes. Many international efforts try to overcome this problem by attempting to set standards for the representation and exchange of electronic data in tourism. Probably the most known and diffused is the proposal made by the Open Travel Alliance (OTA: http://www.opentravel.org/), a consortium of many important companies active both in the tourism and the information technology fields. The work is done at two levels. The first level concerns the semantic aspect, and standard definitions and names for the different objects (a trip, a destination, a hotel, a room etc.) involved are set by building an ontology (i.e. an agreed classification and definition scheme) (Gruber, 1993). The second regards the technical means to store and transfer data. One proposal is the use of a service-oriented architecture (Erl, 2005) based on eXtensible Markup Language (XML) standards (see http://www.w3.org/XML; Harold & Means, 2004). Commercial software also exists; the Nesstar system (http://www.nesstar.com) is used by a number of national and international bodies.

An associated development is the increasing attention being paid to making the data on which research results are based openly and publicly available. Several major publishers already offer authors the possibility of making their original data available as an online supplement to a published journal article and the Scholarly Publishing and Academic Resources Coalition (SPARC, http://www.sparceurope.org), which brings together major research libraries and peak university bodies, acts as an advocate of this open data model. Nonetheless, it is rare to find original data, not only in tourism, but also in many other fields. An old joke in the life science research community is that 'the data are mine, mine, mine' and papers have been written on the subject (Campbell et al., 2002).

One more effect of the widespread use of computers in research concerns the increasing utilisation of computational models and simulations. This not only increases the types of data available, but also complicates the picture as specific information about the algorithms, the software and the different parameters used to set up a model run are important pieces of information required to repeat or evaluate results. The reproducibility of findings is central to the scientific enterprise, and one key constituent is the availability of the data and the procedures used for examination and inspection by the larger community of researchers. The increasing use of computer simulations, then, worsens the problem as software and algorithms that underlie these artefacts should be well verified and understood, not limited to a generic description such as the one that is normally put in a paper or a report. In some fields, particularly in the sciences, researchers are setting up environments known as collaboratories to share this critical information, but as yet there is no collaboratory for tourism data, software and routines (Sonnenwald, 2007).

We conclude this section with an example of how one journal has addressed the issue of replicability. The Journal of Applied Econometrics states: 'A special feature of the Journal is its emphasis on the replicability of results by other researchers. To achieve this aim, authors are expected to make available a complete set of the data used as well as any specialized computer programs employed through a readily accessible medium, preferably in a machine-readable form'. Hopefully, this example will be followed by many others and modern information technologies, standards for recording and making data available and the attitude towards using them, will help by increasing the visibility and accessibility of data and algorithms.


Quantitative and categorical data

Another fundamental distinction between types of data concerns the level of measurement of the data. This distinction is well covered in statistical textbooks, but it is so critical to the selection of appropriate statistical techniques and is so often ignored that we want to draw attention to it here. We will distinguish primarily between quantitative or metric data and categorical data. Quantitative data are data that are measured using a numerical scale that reflects a quantity such as temperature, height, cost, age (in years), percentage satisfaction with an experience and so on. Categorical data represent qualities or characteristics that can be used to categorise a person or object; examples are sex, age group, country, approval or disapproval of a policy or plan and so on. While categorical data can be included in statistical analyses, they often require special treatment if the results are to be meaningful.


The many forms of data

One more important element when considering data is the different forms in which they are recorded and expressed. Data can be found in forms such as simple numerical quantities, high-dimensional data (i.e. multivariate data that can only be defined by recording observations on several variables), geometric or spatial records, images, graphs, texts, maps and geographical representations (Shoval & Isaacson, 2010). Advances in modern technologies have led us to a world in which practically every object of interest can be put in digital form. Overlaid on any one of these there is often a temporal dimension that multiplies quantities and types to be managed by a factor equal to the number of time steps of interest; this may result in very large data sets.


(Continues...)
Excerpted from Quantitative Methods in Tourism by Rodolfo Baggio, Jane Klobas. Copyright © 2017 Rodolfo Baggio and Jane Klobas. Excerpted by permission of Multilingual Matters.
All rights reserved. No part of this excerpt may be reproduced or reprinted without permission in writing from the publisher.
Excerpts are provided by Dial-A-Book Inc. solely for the personal use of visitors to this web site.

"About this title" may belong to another edition of this title.

Other Popular Editions of the Same Title