Audience Dialogue

Audience research software

A guide to some of the widely available software packages, with comments on their suitability for various types of research.

If you're a statistician this page may seem insultingly over-simplified - but it's not aimed at you. For thorough overview of research software, try this Glasgow University stats site.

With audience research (as with social and market research) there are three types of things that software needs to do:
(1) Get data into a computer (input)
(2) Process data (analysis)
(3) Get data out of the computer, in a form which human brains can understand (output).

The data comes in two quite different varieties:
(a) The quantitative data (numbers and codes) produced by questionnaires;
(b) The words produced by qualitative research.

Because most software only handles words or numbers, not both, this means there are six types of software to consider:

< td>Quantitative
  Qualitative
Input 1. Data entry 4. Text entry
Analysis 2. Statistical analysis 5. Qualitative analysis
Output 3. Statistical output 6.Text output

It would be nice to divide this review neatly into the six above types, but the available software doesn't fall into those neat groups. So the page is in this order:

Quantitative

1. Software for coded data entry
2. Statistical software
3. Survey tabulation programs
4. Software for web survey
5. Using spreadsheets and database programs for surveys

Qualitative

6. Software for qualitative data entry
7. Software for qualitative research

And this review ends with some notes on
8. How to choose software.

(1) Software for coded data entry

This sounds incredibly technical, but what it actually means is typing in all the answers given in a survey. If you've only used spreadsheet or database software before, you may not have encountered coded data entry. Because most survey questions are multiple-choice, and a respondent is asked to choose from a limited range of answers, the tradition in surveys has been to give each possible answer a code - usually a single-digit number. 1 means the first possible answer to the question, 2 is the second listed, and so on.

For example, the survey might include the question:

Q26. How much do you like reading this page? To answer, tick the appropriate box []1 Not at all []2 A little []3 A lot []4 Don't know till I've finished it
Notice the number beside each box? Instead of keying in "not at all", the data entry operator keys in 1. A "codebook" is set up in a computer file, which tells the survey software that "1" in Q26 means "not at all".

The survey software turns the codes back into full wording when the results are displayed. This method of coding produces small data files, fast processing, and very fast data entry. But it's also possible to make serious mistakes. If the first possible answer for every question is 1, you can easily miss a question or enter the same answer twice. Such errors are prevented by good coding design, and good data entry software. If the operator tries to answer "5" to that question, the software should beep, display an error message, and refuse to continue.

Most of the statistical programs allow spreadsheet-style data entry. After a survey has been done, with printed questionnaires, you can sit down and enter all the results into this spreadsheet. It works, but it's not very efficient. And it's very easy to make a mistake.

Of the statistical packages listed below, SPSS has a data entry module, but that costs a lot extra. Epi Info shines here: it has a built-in data entry system which enables extensive checking - though not as thorough as a purpose-built data entry program.

Computer-based interviewing systems

Many statistical packages seem to assume that data from a survey has already been entered into a computer system, somehow. This was a safe assumption in the time when printed questionnaires were filled in by interviewers, and data was punched onto cards. But these days things are different - and not all the software has caught up. A lot of this software has been around for decades. I first used SPSS in the late 1970s, and apart from the window interface, many aspects of it haven't changed.

These days, most interviews are done by phone. The interviewer sits in front of a computer, reading questions off the screen and typing the answers in directly, without ever seeing a printed questionnaire. And with more and more people able to use computers, the respondents themselves can sit at a computer and answer the questions. Typically, each question occupies a full screen. When the answer is given, the screen is cleared and the next question appears. Elaborate branching is thus possible. Instead of everybody being asked the same questions, you can design a tree-like questionnaire, of which most respondents might see only a small part. Some programs which enable this are:

Sawtooth Ci3 and Sensus Q&A

Produced by Sawtooth Software
, one of the pioneers of CATI (computer-assisted telephone interviewing), as it used to be called. Powerful programs, designed for regular users.

Scytab

Surveycraft (recently acquired by SPSS) has produced an incredibly powerful program. You want to enter dates in the Japanese Imperial Calendar? No problem. But data entry can be surprisingly complex, as I discovered when I designed a customized CATI program. Scytab seems to do it all - but the disadvantage is the learning time involved. Not for casual users.

(2) Statistical software

SPSS

SPSS is by far the most widely used software in social research. It's been around for a long time, seems to be used in every university, and will do practically anything you can think of in statistics. Its manuals are comprehensive, clear, and well-indexed - I suspect they're a large part of the reason for the success of SPSS. It has always been reasonably easy to use (compared with other statistical programs) but until recently has been fairly slow, and its output has looked messy - not suitable for including in reports directed to non-statisticians. Around 1996, with version 7, the usability of SPSS took a leap forward. Even so (and it's now up to version 10) the interface could still stand a lot of improvement.

The main problem is the price - close to 1000 US dollars for the basic system. You can do a lot with the basic system, but if you want to use any advanced statistical techniques, you have to buy more modules. The complete system costs around ten thousand dollars. At the other extreme, there's an annoyingly limited student version, available (to students) for about 50 dollars.

On top of the initial price, you have to pay an annual fee to use it. If you forget to renew the license, SPSS stops working.

For a while, a Mac version of SPSS was available, but it had lots of problems. Serious Mac users of SPSS would run it under Soft Windows. But a Ukrainian team has been beavering away, and in late 2000, version 10 for Macintosh is being released.

Epi Info

Epi Info is designed for epidemiologists, so it includes a slightly different set of statistical procedures from most of the other programs: mapping tools for measuring the spread of epidemics, and so on. I've found Epi Info very useful for audience research: audiences spread in much the same way as a contagious disease! The two big advantages of Epi Info are
(a) it's free (it was funded by the World Health Organization), and
(b) it's available in 13 languages.

Courses on it are taught at schools of public health in many countries. Its main disadvantage is that it's not good at labelling data. Also, it's only available for MS-DOS. It fills 4 floppy disks, including the manual, but the key programs fit on a single disk.

News flash! July 2000: The Windows version, Epi Info 2000, has just been released. At first glance, it seems harder to use, but more powerful than the DOS version. It's a 37 megabyte download, so its former slimness has gone. Don't think of running this on an old PC. I'll use it on a forthcoming project, and report in more detail in a few months.

Unlike SPSS (which only deals with the actual statistical analysis, unless you buy extra modules), Epi Info is a complete statistical system. It includes a word processing program for producing questionnaires and writing reports, and has a powerful data entry program which can detect many errors as soon as they're typed in.

Statistica

This program, similar to SPSS in scope and cost, seems easier to use than SPSS. (I haven't used the real thing, only a demo version.) Its particular strength is its graphing capability. There's a slightly cheaper cut-down version called Quick Statistica, which would handle most people's needs.

SAS

SAS is at heart a database system, which also does statistics. The complete set of manuals is a horrifying sight: it sprawls for a whole metre across a bookcase. The particular strength of SAS is handling large samples. A million cases is nothing to SAS. It's more of a mainframe program, though there is a PC version. It's not difficult to use for basic statistics, but to me it feels like using the end of an elephant's tail as a paintbrush. Recent versions include a useful "wizard" interface, which helps inexpert users work out which statistical procedures they should use.

Statview

Once this was a Macintosh-only program, but now there's a Windows version as well. The user interface is rather different from other statistical software I've seen: confusing at first, but when you get used to it, its advantages become more obvious. Statview is very fast, and as its name implies it can produce a wide variety of graphs.

Datadesk and JMP

The strength of these two programs is in exploratory statistics, displayed in graphic form. Both of these help you to understand your data better - something the heavier-duty programs such as SPSS often fail to do.

Other general-purpose statistical programs

The peculiarity of survey data, compared with other kinds, is that most survey data is measured at a fairly primitive level: mostly nominal, occasionally ordinal, but seldom ratio data.

(If you didn't understand that, maybe you need to learn more about statistics. Consult our Statistics page.)

However, most statistical programs are designed around ratio or interval data. Even SPSS, designed specifically for the social sciences, expects its input to be in numeric form. There's nothing numeric about a survey question like "Which TV programs did you watch yesterday?" However, to do statistical analysis, each program has to be allocated an arbitrary number. Statistical programs, all of which pride themselves on their numerical accuracy, will report the mean TV program to an accuracy of at least 8 decimal points.

If you're dealing with data where the difference between 1.000012 and 1.000013 is truly meaningful, and big numbers give you a sort of tingling feeling in the brain, I recommend these heavy-duty number-crunchers - the software that real statisticians use:

  • Stata. The latest version is "intercooled" - i.e. it's like a truck, complete with DOS-like interface.
  • S-Plus .
(3) Survey tabulation programs

A slightly different class of programs is those intended only for processing survey data. They tend to be more limiting than the statistical programs, but can be easier to use. They're designed for use by market research companies, and their strength lies in producing tables of numbers rather than doing statistical analysis. Many of these programs are described in the Market Research Software Archive, which also offers downloadable demos for most of them.

Like the statistical packages described above, most of these tabulation programs are designed to be used constantly. If you only use them every few months, you forget the details, and get the same problems over and over again. Three that I've used and found OK are:

Statpac

Statpac is available only in a DOS version (Statpac Gold IV), but it's a no-fuss piece of software with which you can get a lot of work done in a short time. There are rumours of a Windows version coming out soon.

The Survey System

This is more of a market research system than a statistics system - though the differences are becoming less marked. Its strength is in producing extra-wide "banner tables" that market researchers love (but most other people have trouble reading accurately).

Like the statistical packages, the tabulation programs aren't cheap: mostly around 1000 US dollars. I've tried various shareware tabulation programs, but haven't yet found one without serious bugs and limitations. If you know of a good one, please tell me about it, and I'll include it here.

(4) Software for web surveys

I'm still working on this, looking into what's available. So far Powertab (Mac) and Perseus Survey Solutions look the most promising. I'm also trying out a few others, and the big companies like SPSS are getting ready to wheel out their web-based software.

(5) Using spreadsheets and database software for surveys

Spreadsheets

All the main spreadsheet programs - Excel, Lotus, and Quattro Pro - have statistical routines built in. But they are cumbersome to use, and it's all too easy to make a serious mistake - and never know it. Missing data can be a problem with spreadsheets, too: if you leave a cell blank when a respondent couldn't answer a question, the cell is usually ignored. With survey and statistical programs, you can make the often-important distinction between "don't know", "not applicable", and "not answered".

Excel now has pivot tables and a Frequency command, but both are cumbersome to use, compared with software like Epi Info or SPSS. If you're stuck with Excel, install the Data Analysis Toolpak and learn how to use that: it's the most trouble-free of the various ways of doing statistics with Excel. These notes by Eva Goldwater are helpful.

Another problem with spreadsheets is that they're weak on data entry. Most statistical software will let you enter answers to survey questions in time-saving coded form (e.g. "M" for male) and check the validity of answers as you type them in. Though the latest spreadsheets (e.g. Excel 97) will allow that, it's not easy to set up, and just because you select a value with a mouse doesn't always mean it was the correct value.

I find the biggest problem with spreadsheets is that it's horrifyingly easy to make mistakes - specially when you're modifying an existing spreadsheet for a new purpose. Two tricks I've found that help a lot are

  • showing entered and calculated cells in different colours
  • calculating everything in two different ways, then comparing the results, and showing the difference between the two in another cell. When the difference is not zero, you know you've got a problem.
So I don't recommend using a spreadsheet for statistics unless you're thoroughly familiar with the software, e.g. a regular user of functions, macros, and so on.

The University of Leeds has an excellent summary of spreadsheets and what they can do.

Using database programs for statistical analysis

Databases are better than spreadsheets for data entry, but weaker for analysis. Some programs I've tried are Filemaker Pro, Lotus Approach, Microsoft Access, and several variants of Dbase, such as Foxpro. These programs let you set up validity checking for data entry, but can't manage coded data efficiently.

And when all the questionnaires have been entered on the database, you then face the problem of summarizing the results. Database software is mostly designed for business record-keeping, so these programs have extremely limited ability to find patterns in data - which is what statistics is about.

(6) Qualitative data entry

"Qualitative data" is another name for words. Vast numbers of them, perhaps, and perhaps arranged in a complex structure - but still just words.

The data entry software for qualitative data is usually a text editor (such as Wordpad in Windows 95), or a word processor, with data typically saved in ASCII format.

If you have long interviews or discussions to be transcribed verbatim, and you have a powerful computer (i.e. a recent Pentium), and you are a slow typist, consider using one of the newly-improved speech recognition programs. There are three main competitors, most of which come in several different varieties (of steadily increasing power and cost).

With these, you "train" the computer to understand your voice, then speak into a microphone. (This is how the ads express it - but in fact the computer is training you!) The words appear on the screen, and you can type in corrections at any time. I know people who are using this type of software for data entry of group discussions. The trained computer understands only its master's voice, so the procedure is to play back a sentence or so of a taped interview, repeat it into the microphone, listen to another sentence from the tape, and so on.

Speech recognition software is progressing rapidly. What I've just written is based on my experiences in mid-1998, but all the above programs are improved now - and require more powerful computers. You need at least a Pentium II or equivalent, with lots and lots of memory. Until recently there has been no voice-recognition program for the Macintosh - which is strange, because the earliest Macs had speech synthesis software (the other direction: text to talk), but now ## is available in a Mac version.

(7) Software for analysing qualitative dae

Like statistical software, qualitative software can take a lot of time to learn. If you're going to spend years working on your PhD, it will be time well spent. But if all you want to do is a simple study for your radio station, the time you'd spend learning to use something like Nud*ist would be disproportionate.

And bear one thing in mind: unlike statistical software, which does analysis for you, the qualitative software doesn't actually analyse. It simply makes it easier for you to do that. A lot of skilled qualitative researchers manage perfectly well without such software: they use scissors, paste, large areas of floor, lots of index cards - plus plain old insight. That insight is gained by immersing yourself in the data, till you know it practically by heart. Computers can get in the way of that, by wasting your time, e.g. fighting your attempts to reformat text. (Microsoft Word excels at that.)

Word processing software

You can do a lot with a powerful word processor, preferably one with outlining ability, word counts, pattern-searching, and ability to sort selected paragraphs. Word, Ami Pro, and Word Perfect are all in common use. I prefer Ami Pro, which I find less wilful than the other two; and it has better graphing facilities - useful when writing reports. That's on a PC.

On a Mac, I've had more trouble. Word (in the new '98 version) has overcome its previous painful slowness, but its constant attempts to be helpful waste more time than they save - and take forever to disable. (It reminds me of a very willing, but very dumb dog.) Word Perfect has serious screen display problems, and Ami Pro isn't available. Appleworks is OK, but lacks some facilities. Other possibilities are BBEdit (the full version, not the cut-down shareware one), and the powerful text processor Qued/M - if it's still around. Nisus can do anything - once you've learned to use it.

Apart from those general-purpose text-handling programs there's a wide range of special programs for qualitative research. Unlike the statistical and survey tabulation programs listed above, which all do more or less the same kind of thing, the qualitative programs all tend to do different kinds of things. Here are some that I've tried, and found helpful.

Inspiration

This is like an outlining program which does diagrams. That doesn't sound like much, but everybody who uses it raves about it. And you can learn it quickly, too.

Decision Explorer

Another program which expresses concepts in diagram form - designed for working through decisions and their ramifications. Not suited to large chunks of verbatim text.

Nud*ist

Much more ambitious than the two above programs. It specializes in manipulating words and text, and has the most powerful set of searching capabilities I've ever come across. QSR software, the publishers of Nud*ist (in case you were wondering, the removal of clothing is purely metaphorical) have a lively listserv email discussion group, which you can subscribe to through their web site. If you have roomfuls and roomfuls of text to make sense of, Nud*ist may be what you need. But, like many of the other programs described here, it's not something you learn in a day or two. A new version of Nud*ist, entitled NVivo, has recently come out (mid-1999).

Two other programs vaguely similar to Nud*ist are Atlas/TI and The Ethnograph. I've never used these, but they are very highly regarded by their users.

Free-form text databases

Two database programs designed for free-form text could be useful for qualitative research. These are AskSam and InfoSelect. InfoSelect handles scatttered small chunks of information, so it would probably be good for unstructured research - the preliminary stage, when you are collecting scraps of information on a new research topic, and don't yet know what to make of it all.

AskSam can handle either free-form text, or text in fields - which is more useful when you have a good idea of the concepts you are dealing with. A not-too-disabled demo version is available, so you can download it and see if it suits your purpose.

TACT

A set of DOS shareware programs for analysing texts, designed for use by literary scholars, but also usable by qualitative researchers. (Their approaches can be surprisingly similar.)

There are many more programs designed for qualitative data analysis,but I haven't used most of them. An excellent book by Weitzman and Miles entitled Computer Programs for Qualitative Data Analysis describes a wide variety of them. Though it's now outdated (having been published back in 1995) a new edition is on the way.

A useful link is CAQDAS (Computer-assisted qualitative data analysis software).

(8) How to choose research software

Which program is best?

The answer to this question depends on
  • What level of analysis you need to use
  • How much money you have available
  • How much time you're willing to invest
  • What skills and facilities you already possess
  • What your friends are using (so that you can call on them for help)
Whichever option applies, you should have - or be prepared to acquire - a reasonable command of descriptive statistics. (You need to understand averages, medians, percentages, frequency distributions, and so on. If you did a course on descriptive statistics, it would probably need to include about 30 hours of class time. Even if you plan a qualitative approach, you should at least be able to argue your case when somebody asks "Why didn't you do a straight survey?"

On top of knowing something about statistics, you also need to know how to use the software. If you're not willing to do that, find somebody else who will do your survey analysis for you - perhaps in a social science department in a nearby university.

If you're already an expert with a spreadsheet and/or database program, and not planning to do more than one survey, use the program you know. You may have to stretch its facilities, but the latest spreadsheets and databases have some capability for statistical analysis.

If money's not a major problem, and you have (or will get) a computer running Windows 95 or later, with at least 64 megabytes of memory, I recommend SPSS. It's not cheap, but it's widely used - so there's a good chance you can do a local course on it, or get help from an expert. And the latest version is quite good to use. Though it has many annoying features, so do its competitors.

If you use a Macintosh, I can't recommend SPSS - at least until version 10 comes out and is tested thoroughly by users. The current Mac version (6) is not as up to date as the Windows version (10), and has problems that the SPSS help line (in Australia) seems unable to solve. For survey analysis on the Mac, I provisionally recommend Powertab, which is cheap and seems easy to use - but I've discovered this only recently and haven't had a chance to try it out yet. Datadesk is said to be excellent, though very expensive.

If money's a problem for you, and you have access to a computer running MS-DOS, with 640K of memory, the best option is Epi Info. Its weak point is the labelling of variables, so the tables and graphs it produces always need to be embedded in an explanation of the exact variables and values being analysed. However the built-in word processor makes this easy.

If you use neither a Mac nor a PC, SPSS is available on a huge range of computers, though some versions are more advanced than others. SAS is

Please bear in mind that the above comments are based on my own experience. As it's a few years since I've used some of the programs, some of the problems I've mentioned may no longer exist. Please let me (Dennis List) know if you've spotted a mistake, and I'll correct it.)

For all I know, there's a marvellous piece of survey analysis software that I've never heard of. (After all, I discovered Epi Info only in 1997 - after working with surveys since the 1970s.) If you know of such a software gem, please tell me about it, instantly!. I'm thinking of something easier to use than recent versions of SPSS, and as lean as Epi Info, but with...

  • data entry facilities optimized for accuracy and speed
  • no problems processing alphabetic data
  • handles open-ended questions without fuss (qualitative and quantitative in the one package)
  • powerful data-sifting abilities (e.g. automatically spotting rogue cases)
  • with macros or scripting which can automate repetitive work
  • online help which can solve your immediate problems without your having to scroll through endless screens of irrelevant stuff (i.e. as good as Epi Info)
  • a clearly-written manual, which exactly matches the software
  • quickly producing presentation-quality tables and graphs
  • population-projection options that take account of the various types of missing data,
  • a "wizard" to help less-expert users choose the most appropriate statistical tests for their needs - and explain why,
  • and a reasonable price - no more than 200 US dollars.
... Software developers of the world - are you listening?

< Books
 HOME 
 Using research >

© Audience Dialogue ~ September 2000
Email us: info@audiencedialogue.org