Guest Blog

Bad Science and Beyond

Bad Science and Beyond

“I just don’t care!” the doctor said, in response to a query on a personal health-related issue during a recent New Zealand television interview.

I certainly had never heard a doctor speak in this way – and it made me curious to hear more in his live talk.

I was not disappointed. The thought-provoking comments kept coming: “The right way to build public trust is to earn public trust, and to share data” Ben Goldacre adand, “The paradigm of medicine has somewhat shifted”.

In his unique and energetic style, Dr. Ben Goldacre got his message across to the audience at the Mercury Theatre in Auckland this September. A British physician, researcher, columnist, and author, Dr. Goldacre has made it his mission to tackle “bad science”, whether it is used by drug companies, politicians, journalists, or researchers.

Dr. Goldacre explained the misrepresentation of the research life cycle (objective – data collection – data analysis – publication – evaluation), showing the audience the easiest way to mispresent science and the shortcomings of medicine.

Using statistical data taken from newspapers, advertisements, and the research reports, the audience learned how so-called “in-depth scientific research” can be used as a clever marketing tool and how defined research objectives can often by driven by profit.

Ben Goldacre - Bad Science

Photo credit: Dr Michelle Dickinson

One telling example of misrepresented data is the sudden drop in the number of tonsillectomies carried out on children in Hornsey, North London. There was a big discrepancy before and after 1929 – that is, from a few hundred cases down to almost nothing. After some investigation, it was revealed that the decline in the number of tonsil operations coincided with the retirement of one individual medical officer at a particular school, replaced by someone with a different opinion as to the merits of the treatment. Such a case reveals the power of doctors’ choices, rather than patients’ needs. It also shows that how unreliable the data will be if the full picture is not disclosed.

Sharing a more recent case, Dr. Goldacre explained the use of statins, a medicine to lower cholesterol in the blood. Many treatment options are available to lower cholesterol against a placebo, but these have not been tested against one another to determine “real world effects”, including death. Dr. Goldacre and his team approached the UK National Health Service (NHS) to collect patient information. If patients agreed, doctors would be able to randomize the treatment options, ultimately finding the optimal treatment for considerably less resource compared to the traditional “door knocking” data collection method. Nevertheless, this was thwarted by opposition from ethics groups arguing that patients should have a choice.

Dr. Goldacre’s presentation led me to wonder; what role does the general public play in formulating bad science? Surely it is not just the domain of motivated organisations, unreliable researchers, and sensationalist media. Bad science can, and often is, disseminated by all walks of life.

How can we stop its spread? Transparency may be the answer – including that of research objectives, processes, and publication. If research objectives are set for the benefit of all, or purely for the improvement of a company’s bottom line, the public should know – and in a language they can understand. Let consumers make up their own minds.

Silencing Science - Shaun HendyUnfortunately, providing scientific data to the wider public and expecting people to reach their own conclusions may not be sufficient. Effective communication is also a critical element in combatting bad science. Nowhere is this more eloquently stated than in Professor Shaun Hendy’s timely book, Silencing Science: “The job of the scientist is not just to deliver the facts, but also to engage democratically to assist the community to weigh the full breadth of evidence” (p96).

Indeed, scientists and research providers should communicate well, working together in the best interests of the public.

As a consumer and a citizen in a world of information overload, it can be easy to be misled by a well-packaged data snapshot. We need to embrace a reliable and complete picture, and in terms we can understand. This will allow us to make our own choices in areas as broad as health, education, career, life-style, and more.

As a student of science and maths, I am beginning to grasp the moral imperative of the scientific community. Scientists should not only be answerable to their fund providers, but to everyone. A good start would be the publication of research findings representing the whole truth.


Tristan Pang

14-year-old Tristan Pang is a maths and physics major at the University of Auckland. He is also the creator of Tristan’s Learning Hub, producer and broadcaster of Youth Voices, founder and webmaster of several community websites, frequent speaker at schools, organizations and conferences, and tutors students from primary school level through to university. He aspires to make a difference in the world.

A more social network

A more social network

In the immortal words of Vanilla Ice – Stop, collaborate and listen. Collaboration is a cornerstone of modern science and with flight tickets cheaper than ever before and the internet effectively eliminating the expense of correspondence, academics and researchers are looking further afield and reaching more contemporaries across the globe. However, different institutions have different facilities and research focuses, not everyone speaks the same language, and so perhaps these researchers may be picky when it comes to who they work with. It raises the question of whether they do have a preference in collaborator based on affiliation and, if so, can this preference be measured and distilled into cold, hard data?

Of course they do, and of course it can be. More to the point, why?

Arguably the most tangible and conveniently quantifiable means in which academic collaboration manifests is in scientific papers and articles, typically with several authors from varying affiliations. A notable drawback in previous studies on research collaboration is that the measures used (such as the fractional count detailed in Nature Index) consider results for each institution, rather than individual academic, and disregard the size of each institution; as a result, smaller and younger institutions may stack up unfavourably compared to those that are more established and larger. For example, take a look at how the eight New Zealand universities compare against each other:

  • The nodes representing each university are weighted by their respective output (total number of co-authored papers by academics affiliated with these universities).
  • The links connecting universities to each other are weighted by the number of papers co-authored by researchers from both institutions.
  • The higher the link weight, the more that the connected universities are attracted to each other.

The skewing effect that university size has on this network is pretty apparent from how Lincoln University has much fewer co-authorships with Victoria University and University of Waikato than with the rest of the network, given its relatively small output. Also of note is that the University of Auckland and AUT have a much lower link weight than one would expect for two universities across the street from each other, yet the University of Auckland and the University of Canterbury have a much stronger link despite being at opposite ends of the country.

First, to address the effect of institution output. We do this using something we call the revealed comparative preference (RCP) of an institution i for collaborating with institution j:

Formula: revealed comparative preference (RCP) of an institution

where Xij is the number of co-authorships between i and j, Xi is the total number of papers co-authored by i with other institutions in the data set, and X is the total number of co-authorships between all the institutions in the data set.

Plainly speaking, it’s a measure of whether two institutions are doing more than collaborating than we might expect with each other relative to their tendency to collaborate with the other universities in the data set. If  Pij > 1  , then universities i and j share more co-authorships than we expect relative to the other institutions in the data set, so we say they have a comparative preference for collaborating with each other. Conversely, Pij < 1  indicates that the two universities are doing less than we might expect.

Anyway. Here’s the NZ university network revised with the links now weighted by their corresponding RCP values:

Better. Here it’s apparent that AUT has a stronger link with Auckland Uni in addition to Lincoln and Waikato, and it should be pointed out that University of Auckland, AUT and Massey University are also closer to each other in the network, bearing in mind that all three have campuses within Auckland.

Now with a working measure, we move on to a larger sample. Bring on the Australians.

Clearly the Tasman Sea has a solid effect on the way New Zealand based researchers connect with those based in Australia; the links within the NZ cluster of universities have greater RCP weightings than those within the Australian cluster, implying a preference for domestic rather than trans-Tasman co-operation. Another feature to consider is that the Australian universities in the same states are grouped together, which is consistent with the idea that geographical proximity plays a significant part in a researcher’s choice of collaborator.

It would only be natural to wonder how academics interact on a global scale – do we ever grow out of talking almost exclusively to our friends and shun outsiders in some weird, grown up, Mean Girls-esque collection of cliques?

From observing how the Dutch and German institutions are grouped together, we might conclude that the language barrier is a large hurdle to overcome when jointly writing scientific literature – this also seems apparent from the Chinese-Hong Kong cluster, as well as Korean and Japanese institutions as well. But languages also tend to cluster geographically, so it is hard to disentangle the effect of language from distance.

It’s no question that with the constant progress of technology, connecting with people is becoming less costly. However, there are factors remaining that impede the prospect of a totally connected scientific community, some of which have been speculated on here. Of course pictures and hand waving don’t constitute a solid argument, but a thorough analysis of these factors and their effect on university collaboration will be in store for you, dear reader.

In the meantime, perhaps one should learn German, or Mandarin, or Dutch, or even Japanese. It’s not that hard.

About the data visualisations
In order to make the larger graphs efficient enough to be used in browser, the amount of connections a node could have to other nodes was limited to its top four RCP values. This change had no significant effect on the clustering observed when the full connection matrix was used. The change was only implemented for the QS, ANZAC and benchmark data sets.


Bonnie Yu is a research assistant at Te Pūnaha Matatini and a member of Te Pūnaha Matatini’s Whānau group for emerging scientists. Her research projects focus on university collaboration networks.

The data visualisations of this post were prepared by fellow research assistant, Nickolas Morton.

University of Auckland Equity reporting – also not such a silly idea?

University of Auckland Equity reporting – also not such a silly idea?

By Caleb Gemmell and Catherine Webb

The Equity Office – Te Ara Tautika – at the University of Auckland has published an equity profile of the university for 2014. The purpose of these yearly publications is admirable – to transparently report how the university is progressing towards its goals of ethnic, gender and disability equity. Upon reading it, however, we were puzzled by the presentation of some of the data. So, in the helpful spirit of our previous Not Such a Silly Idea blog series, we started doing a little deconstruction.

Page 47 is about women in the ‘academic pipeline for advancement to senior positions. It shows the presence of women in the university from junior positions (undergraduate students) through to senior positions (professors).

1There are several issues with this graph. First of all, can you tell what the Y axis is trying to measure? The numbers are too small to be a sheer headcount. A percentage, then. Eventually, we worked out that this is the percentage made up by women of the total FTE (full-time equivalent). Please label your axes.

Secondly, this graph looks deceptively like a time-series. It gives the impression that we are following a cohort of women over time through their academic careers (which would actually be very useful, by the way). However, the data is simply a snapshot in time across many categories. Truly, there is no reason for this data to be presented in a line graph; this is an ordinal scale, and the categories are discreet. It would be better represented by something like a bar chart.

The report’s use of the term ‘academic pipeline’ to describe the chart suggests that the report’s authors are attempting to show the retention of women as they progress through university careers, to gain an idea of where and with what magnitude women drop out of academia. Unfortunately, the overall impression we get from this graph – that women’s university careers tend to wane above Assistant Lecturer[1] until an uptick in the Senior Leadership Team – is not a valid conclusion to draw from a snapshot of data that compares the results of non-contemporary cohorts.

Another reason the pipeline analogy is inappropriate here is that it implies it is normal to progress from student to staff, and from teaching staff to management. This is not the norm. It is fairly unusual to enter the workforce as a teaching academic with little prior experience. The inclusion of the Senior Leadership Team on the end of this progression is especially strange, since many of those positions are not academic at all – for instance, directors of HR, library, or finance – and certainly don’t require you to be a professor first!

So, we tried to present the Equity Office’s information in a more useful way.


This pulls out the clearest information the data has to offer, showing how the gender composition of each academic position has changed since 2000. Whereas the line graph appeared to prompt lateral comparison, this graph highlights the change over time within each category. It cannot be called a pipeline, but it is useful!

You can see that there is now a more granular breakdown, including HoDs and deans. We did this to be more informative and to smooth the sudden jolt between professors and senior leadership. Something we noticed, apart from the greatly improved gender ratio in almost every category in 2014, was the huge change in the ‘Heads of Departments and Schools’ category. We realised that the Faculty of Arts and the Faculty of Medical and Health Sciences (FMHS) were the only faculties which included both departments and schools, and that the schools were created additionally after 2000. That meant a greater number of heads in total, but not necessarily a greater proportion of women.

[1] The use of the category Assistant Lecturer, is its self curious, since no such employment position exists at the University of Auckland. The report’s authors say that the Assistant Lecturer category “…includes GTAs [graduate teaching assistants] tutors etc.” However, GTAs are typically postgraduate students, so would be represented twice in this data. The position of Tutor at UoA has not been in current use for several years, having been replaced by the position of Professional Teaching Fellow (PTF), which is presumably included in the etc. Missing from data are post-doctoral fellows, research fellows and senior research fellows. These positions often make up a sizable fract of many departments, the latter two positions being comparable in seniority to the positions of Lecturer and Senior Lecturer. The data is further confused by the fact that the mid-ranges of the pipeline might be better thought to consist of three parallel streams: PTFs (responisible for teaching only), Research Fellows (typically engaged in research, but not teaching), and Lecturers (engaged in both research and teaching).

Meet the team: Q&A with Stephen Marsland

Meet the team: Q&A with Stephen Marsland

Meet Stephen Marsland – a professor of scientific computing in the computer science cluster of the School of Engineering and Advanced Technology (SEAT) at Massey University. Stephen is also Te Pūnaha Matatini’s new Theme Leader: Complex Data Analytics. “Data is cool at the moment… but it would be nice to see people using it well and understanding what they can and can’t infer from analysis,” Stephen says. Find out more about Stephen’s research and what he hopes to achieve in his new role in the below Q&A.

Tell us about your research, including projects aligned with Te Pūnaha Matatini

My first area of research is the mathematics of shape analysis. This is primarily concerned with geodesics on the diffeomorphism group, which is a mathematical way of describing how flows of smooth, invertible transformations can deform one shape into another in the shortest possible way. I also study invariants to the actions of the groups that can deform images.

More related to Te Pūnaha Matatini is my work in machine learning, which has two parts at the moment: I’m thinking about manifold learning, where we try to find low-dimensional representations of high-dimensional data, and I’m also thinking about dealing with learning about multiple sources of data where all that you see is the combination of the sources. The first is a popular question, but I’m thinking about it very much from the point of view of differential geometry, and how that can help. I’ve got multiple projects going on there with collaborators in England and China.

The second project is with Marcus Frean, another Te Pūnaha Matatini principal investigator. So for example, you might see images of an object on different backgrounds, and you want to work out that the object and the background are different pieces of information.

I’ve got a very big project called AviaNZ going on that combines the shape analysis and machine learning, which is looking at birdsong recognition, in the hope that we can develop algorithms that will recognise birds from their calls and then infer the number of birds from how they are calling.

Finally, I’m interested in complex systems in their classic sense, both complex networks (which are networks with properties such as scale-freeness, or that are small worlds) and also systems where the interactions between agents cause the emergence of high-level properties. I’ve got a variety of projects with students looking at this, including in health, marriage systems, and soon, the evolution of barter (this last one will be funded by Te Pūnaha Matatini).

What attracted you to the role of Theme leader: Complex Data Analytics?

Data analytics underlies everything that we are trying to do in Te Punaha Matatini, but it isn’t really getting the recognition as a subject in its own right. I’m hoping that by exploring more of the links with the other themes I can make people more aware of how much data analytics there is going on, and what tools are available.

How can complex data analytics benefit New Zealand?

Data is cool at the moment (big data is mentioned everywhere) but it would be nice to see people using it well and understanding what they can and can’t infer from analysis. We collect data everywhere on everything, but lots of it doesn’t actually get used for much. For example, there are thousands of automatic recorders around New Zealand recording birdsong. But unless you have tools to analyse the data, you’ve just got a lot of memory used up storing sound that nobody will ever pay any attention to. Turning data into information isn’t easy, but it has to be done, and done well, to make the collection of the data worthwhile.



Meet the team: Q&A with Mike Plank

Meet the team: Q&A with Mike Plank

We recently caught up with Principal Investigator Dr Michael Plank, a senior lecturer in the School of Mathematics and Statistics at the University of Canterbury. Mike has taken on the role of Theme Leader: Complexity and the Biosphere while Alex James is on hiatus. As a research theme leader, Mike will be steering Te Pūnaha Matatini’s research projects that build a better understanding of New Zealand’s environment and the interactions between biodiversity, the economy, and human decision-making.

Tell us about your research, including projects aligned with Te Pūnaha Matatini

My research is in biological modelling and ranges from the very small (intracellular dynamics) to the very large (marine ecosystems). A common theme in my research is investigating how collective phenomena emerge from interactions among individuals, whether on the scale of single human cell, or the scale of an ocean. I am interested in the insights that relatively simple mathematical models can give into the ways these complex systems function – and why they sometimes go wrong.

One of my projects aligned with Te Pūnaha Matatini is modelling the emergent behaviour of fishers stemming from their decisions about which species or sizes of fish to target. Principles from ecology suggest that natural predators tend to spread their effort according to the productivity of their prey. So why shouldn’t humans behave like natural predators and spread their fishing efforts according to the productivity of the fish? If this really happens, it could change the way we design fishing regulations from top-down control to a bottom-up approach that recognises the effect of the fish stock on the behaviour of fishers as well as the other way round.

What attracted you to the  role of Theme Leader: Complexity and the Biosphere?

We have some really exciting projects going on in the Biosphere theme. I’m really looking forward to a new project that will look at the interplay of ecological dynamics, geospatial data, and social attitudes to map the effectiveness of large-scale predator control. Other projects include investigating the effects of social contact networks on epidemic spread, and harnessing the huge potential of citizen science to enhance conservation projects.

We have some amazing scientists and students involved with these projects and I’m excited to work with them and see how we can turn the scientific results into real impacts for New Zealand’s unique ecosystems.

How can research using complex systems, networks, and data assist New Zealand’s environment?

New Zealand is facing a range of pressing environmental issues, including loss of our endemic native flora and fauna, agricultural pest invasions, and management of our fisheries. We have a large amount of data relating to these, for example the Department of Conservation’s tier 1 monitoring programme, and catch data from our Quota Management System. At the same time, we’re investing substantial money and resources into these areas, but we’re not always making full use of the data that are available. Te Pūnaha Matatini’s research programme has the potential to really add value to our conservation dollar by helping us target our resources to areas where they will have the most impact.

Taking a complex systems and network approach also gives us opportunities to look at environmental issues at a larger spatial scale, rather than focusing on projects in isolation. As a simple example, a predator control programme in an area of Department of Conservation land might reduce or even eliminate the possum population in the short-term. But if there is adjacent, privately owned land without any control, the possums are likely to re-invade in the long-term. Viewing the whole country as an interconnected network gives us a better ability to predict long-term outcomes, and therefore a better chance of eliminating possums for good.

My First Conference(s)

My First Conference(s)

By Jonathan Goodman

Never do things by halves, jump in the deep end, give it a go, eat your vegetables, trust your supervisors. This is all good advice and I now realise I must have taken it, having presented at the first conference I have ever attended, then attending another conference three days later run by an organisation I had never heard of before. I have also joined the Te Pūnaha Matatini Whānau committee based solely on my supervisor’s advice. Before I go on, I must admit that all of these actions have proved to be worthwhile and rewarding.

The first conference was the Te Pūnaha Matatini cross-theme hui. This was the first Te Pūnaha Matatini gathering I have attended since joining the Centre of Research Excellence as a PhD student at the start of the year. The hui consisted of a series of short talks, including my first at a conference, interspersed with four rounds of the “Research Knockout” – a game designed by Alex James. The game started with the creation of teams of 3-5 researchers from Te Pūnaha Matatini’s three research themes. Each team then generated a potential research project. Each round of the knockout consisted of pairing up the groups and amalgamating their ideas into an enhanced version. This continued until there were just two groups remaining. In the grand finale, there was a final presentation followed by a vote. The winning research topic was ‘Measuring the impact of the communication of science’.

The question of science outreach also came up at the conference run by the New Zealand Association of Scientists (NZAS). The conference was held at Te Papa in Wellington and celebrated the 75th anniversary of the Association. The conference had a selection of engaging speakers looking at the role of scientists in the past, the present, and into the future. A number of speakers talked about science communication.

One of the presenters, Simon Nathan, spoke about James Hector and how he effectively pushed the cause of New Zealand science, through his role of Chief Government Scientist, by constantly reminding politicians about the value of science. Rebecca Priestley talked about how science outreach was different back in the days of the Department of Scientific and Industrial Research (DSIR). Instead of scientists engaging in outreach programs, interested journalists and citizens would phone and be able to speak directly with the scientist who was in the best position to answer their queries. Te Pūnaha Matatini’s own Shaun Hendy presented on how social media is currently the only way scientists are able to directly communicate with the population without the risk of their message being obscured. His three guidelines for public engagement were very apt.

Researchers should:

1) Not be d!@#s

2) Get on social media

3) See rule number 1.

The other major theme of the conference was the structure of the pathways inside and outside academia for emerging researchers. I will touch on this in another blog post on the Te Pūnaha Matatini Whānau page.

Having had a rewarding weekend forming connections with talented scientists, and with the science community as a whole, I will sign off hoping that I have followed Shaun’s rules.

Jonathan Goodman

NSSI (Not Such a Silly Idea… but do it properly) #5

NSSI (Not Such a Silly Idea… but do it properly) #5

Welcome back to Not Such a Silly Idea! In this exciting new season, two summer students continue to critique a government document, and this time they have interactive visualisations! In case you missed the first part of our epic journey, you can read instalments one, two, three and four.

One of the claims made in the National Statement of Science Investment (NSSI) which we wanted to look into was based on this graph (Figure 1) on page 20 of the NSSI:

Figure 1

Science and Innovation spend - New Zealand Treasury Estimates

The NSSI asserts that the government’s investment in science has “increased by over 70% since 2007/08”, and that “our science has improved in response”. This is followed with several graphs which show things like increasing publication rate, increasing number of papers in top journals, growing research workforce, and increasing international collaboration over time. These graphs each stand on their own, but fail to relate these improvements to the amount of money the government has been spending on science. One of our goals was to re-visualise these graphs in a way that clearly showed a correlation (or not) with increasing government investment, and we will address that later on. But before that, we had to investigate this data on government spending.

We remain puzzled that the NSSI claims an “over 70%” increase in government expenditure in the last eight years, when according to their own data the increase is more than 80%. Self-deprecation aside, when we went back to The Treasury we discovered that the graph on page 20 is not adjusted for inflation. This immediately indicated that the increase in spending was not quite as significant as claimed, since the government would have had to up their investment by about 40% just to compensate for the devaluing dollar. Using the Reserve Bank calculator, we found that government spending on science has actually increased 55% (not 82%) since 2007/08 and 46% (not 87%) since 2004/05.

After adjustment the government’s spending still showed a rise, however, so we started looking for the implications of that rise. We created the graph below to see whether the government might be able to claim New Zealand’s burgeoning number of publications as correlated with its financial support. To make this graph we had to include a time lag. Cash pays for research, but it takes time for that research to be published. We weren’t sure, though, how long that lag is on average. We did a regression analysis of our data using this equation, in case you’re interested:

regression NSSI#5

We used a range of time lags between investment and publication, from no lag to four years. The time lag that showed the strongest correlation was two years. (We have far too little data to infer anything universal from this – it is simply our best guess.)

And voila, there is a positive correlation between the amount of money poured into ‘science and innovation’ and the number of papers churned out. But what does the money do? Do scientists publish more frequently when they are better funded? In other words, does greater funding increase productivity measured in papers per full-time researcher? Or does the money go towards increasing the number of people producing papers in the first place?

Six years ago, Professor Shaun Hendy published this paper (page 56). It drew the conclusion that although our publication output increased hugely from 1990-2008, that increased output was due to a rise in the number of researchers, not the number of papers each researcher produces in a year. Having read this paper, we expected to find similar results for 2000-2014, but we were surprised to see that both FTEs and productivity have been steadily on the rise, according to both OECD and Statistics New Zealand data.

The publication output numbers we retrieved from SciVal, and we found two different sets of researcher full-time equivalents (FTE) data; one in the Main Science and Technology Indicators database on OECD.Stat, and the other in the R&D surveys on Statistics NZ. There was a confusing discrepancy between these sources because the latter breaks down Higher Education researcher FTEs into ‘researcher’ and ‘student researcher’, while OECD.Stat makes no distinction, and the numbers didn’t come to the same total. Our best guess is that one counts only PhD students, while the other also includes Masters.

These two graphs are very interesting, because in spite of the differences, they support that both the number of science researchers and their productivity has increased.

So, apart from the fact that MBIE needs to be careful with accurately presenting information, we can conclude government investment in science has indeed increased, and that it is correlated with increased output of publications, increased research workforce, and increased productivity. Of course, just from these graphs we can’t be sure which way round the causal relationship works. A great incentive, surely, for both parties to keep up the good work!

NSSI (Not Such a Silly Idea… but do it properly) #4

NSSI (Not Such a Silly Idea… but do it properly) #4

By Catherine Webb and Nicola Gujer

A long time ago in a galaxy far, far away, we started deconstructing the National Statement of Science Investment and considering the claims it makes.

Our first blog post ended with an unanswered question – is the proportion of a country’s papers in the global top 10% of citations a good way (or at least the best available way) to measure science excellence? The NSSI’s definition of excellent science is “well-designed, well-performed, well-reported research, recognised as such, e.g. through peer review.”

In preparation for our first blog post, we were dealing with a graph that suggested New Zealand produces science which is not quite as ‘excellent’ as our comparative small advanced economies, because a lower percentage of our publications make it into the 10% most cited publications worldwide. In that post, we discussed the limitations of that graph, but nonetheless concluded that it does show New Zealand lagging slightly in the production of ‘top-shelf science’.

Still, we were curious to see whether there are any more useful ways of measuring ‘science excellence’ that might paint a different picture.

What if just looking at the papers with the most citations is not the right approach? Are citations a good way to measure excellence at all? One flaw with citations is that papers can become highly cited for two reasons: by being amazing, or by being such poor science that everyone wants to correct it (like Wolfe-Simon’s paper on a bacterium that thrives on arsenic). Also, as we discussed in our second post, citations tend to follow a ‘rich-get-richer’ power-law distribution, which makes a well-known paper garner even more citations, while another paper, nearly as good, can dwell in obscurity all its life.

However, even if citations are not a great way, they may currently be the least-bad way of assessing the impact of publications. But what kind of citations should we use to compare countries? Every country has different areas of specialty, and if they are alone in that area, they may not be cited very much by other countries, even if their science is top-notch. New Zealand, for example, is environmentally isolated and unique. Our conservation or agricultural papers for instance may not be of immediate relevance to anyone else as much as they are to us. If our science is rather useful to ourselves, but not to the rest of the world – should that make it less ‘excellent’?

We broke the data down into only intranational citations per publication and only international.

Because international citations make up the vast majority of any small country’s citations, these have the greatest impact on the percentage of publications in the 10% most cited. Thus, in terms of ranking countries, these two measures of ‘excellence’ can be roughly used as proxy.

Does New Zealand’s rate of intranational citations balance our lagging overall ranking?

Country Intranational citations per publication International citations per publication
New Zealand 1.25 5.95
Denmark 1.70 8.35
Finland 1.46 6.87
Ireland 1.13 7.05
Israel 1.23 6.98
Singapore 1.35 7.82
Mean average 1.35 7.17

It’s possible that New Zealand produces publications which are more relevant and therefore cited more within our own country than they are in other countries; we just don’t cite enough to pull up our average very far,      but this is conjecture.

In any case, does New Zealand do well enough by intranational citations to let us off the hook of general lack-of-excellence? Well, we certainly have room for improvement. The next question is, obviously, how to improve – a subject for another article, where we examine government investment and its effect on the science sector.

Look out for our next episode in the New Year, and may the force be with you over the holidays!

*Featured image by XKCD comics.

NSSI (Not Such a Silly Idea… but do it properly) #3

NSSI (Not Such a Silly Idea… but do it properly) #3

By Catherine Webb and Nicola Gujer

The adventures in data-sleuthing continue. In this blog series, two summer students examine the National Statement of Science Investment 2015-2025, and appraise its use of data. So far we have found one case of partially-correct-yet-misleading-ness, and another of axis-labelling-deficit. This time we take a look at Academic-Corporate Collaboration.

On page 17, the NSSI makes an intriguing statement: “Only 3.2 per cent of New Zealand publications have academic and corporate affiliations, suggesting scope for more collaboration.” We asked, where did this information come from? By what standard do we have ‘scope for more collaboration’? And is that a good standard to use?

Implicit in that statement is the assumption that an abundance of academic-corporate collaboration is a good thing – a claim to be investigated later on.

Firstly, we found that the “3.2%” used in the NSSI is found under ‘Academic-Corporate Collaboration in New Zealand’ on SciVal. The first thing we noticed was that this statistic does change over time, although it fluctuates less than field-weight citations. From the time it was retrieved for the NSSI published in October and the time we checked it in November, it had already fallen to 2.6% (that’s around a 19% drop). Hence, we wanted to check out how stable this measure is over a longer period of time.


We didn’t find anything very remarkable in that regard: on average, the academic-corporate collaboration rate of each small advanced economy deviated about 17% from its initial value over five years, with New Zealand squarely on the mean (+16.7%).

This also helps us answer the second question, ‘compared to what do we lack collaboration?’ The graph shows how our nation’s academic-corporate collaboration measures up to that of other small advanced economies (SAEs); Denmark, Finland, Ireland, Israel and Singapore (widely accepted as comparative countries due to similarities in factors that affect science performance). Using the same measure, this is the data as it stands in Nov/Dec 2015:

Percentage of publications with both academic and corporate affiliations:

New Zealand 2.6%
Denmark 4.8%
Finland 2.5%
Ireland 2.7%
Israel 3.3%
Singapore 2.0%
Mean average 3.0%

We see that by this standard, NZ is below average, but not markedly. We are still above Singapore and Finland, and with our ‘3.2%’ measured earlier in the year, we would have been above the present average!

Presumably, when the NSSI claims that New Zealand is lacking collaboration, they are using the small advanced economies as a reference – they cannot be referring to the worldwide percentage of academic-corporate collaboration, as that is only 1.34%. And yet, if they are comparing us to other SAEs, we are near the average and certainly not significantly underperforming.

Finally, however, we found a significant problem with the New Zealand statistics on SciVal. Academic-corporate collaboration is defined as at least one academic institution and one corporate (business) institution being affiliated with the same publication. On SciVal we found evidence that Crown Research Institutes (which are government entities and not private businesses) are being counted as corporate organisations. Here is one example of a paper listed as an academic-corporate collaboration:

Erroneous inclusion of CRIs as corporate example highlighted

As you can see, the only contributors to this paper are universities and two Crown Research Institutes; AgResearch and Landcare Research. Although our Crown Research Institutes have been ‘corporatised’, meaning that they are managed much like businesses, New Zealand is unique in this respect. Many countries have government science research organisations – the equivalent of our CRIs – but which are treated as purely public government institutes, such as the CSIRO in Australia. This presents an obstacle in drawing conclusions from this data set: comparing academic-corporate collaborations between countries is problematic when New Zealand calls corporate what other countries call government.

This inclusion of CRIs as corporations is skewing the total stats on collaboration in New Zealand, but by how much it is difficult to tell. Unfortunately, it is not possible to find the collaboration data adjusted to exclude CRIs on SciVal. CRIs cannot be excluded from the affiliation search without excluding papers with genuine collaboration between universities and corporations. SciVal’s lack of Boolean operators makes more nuanced searches impractical. Thus, we cannot provide a more correct number for New Zealand’s academic-corporate collaboration percentage than that published in the NSSI. But what we can say is that the NSSI’s number is not accurate, and when CRIs are excluded, NZ’s true academic-corporate collaboration percentage should be, in fact, lower than the NSSI reports.

We have to trust that a similar mistake has not been made in SciVal’s database for any other of the small advanced economies. Without a better dataset we cannot draw any conclusions about the potential for improving academic-corporate collaboration in New Zealand. If anything, this project has highlighted the need for comprehensive data-keeping, as well as taking care in how it is used.

*Featured image by XKCD comics.

NSSI (Not Such a Silly Idea… but do it properly) #2

NSSI (Not Such a Silly Idea… but do it properly) #2

By Catherine Webb and Nicola Gujer

In this blog series we pull apart the National Statement of Science Investment and scrutinise its use of data.  In case you missed it, in our last post we checked out the graph on page 19, and what it says about New Zealand’s academic performance.  This time we’re taking a look at page 18… because science.

Published on page 18 of the NSSI was a graph (Figure 1) representing “Academic Output” for New Zealand showing the amount of publications in each discipline and the quality of the output which was shown using field-weighted citation impact. We actually liked this graph; it seemed to show the information it claimed to (which sadly cannot be said about some of the other graphics in the NSSI). Alas, there was no numerical data or scale to be found on the graph. Yes, it showed the disciplines relative to each other, but surely adding a few values to an axis would make this a more effective graph. So recreating this graph, all it needed was a little sprucing up with some numbers and she’d be right (see Figure 2).

Figure 1. NSSI Academic Output (edited to show the “Multidisciplinary” field) (NSSI, 2015)

Figure 1. NSSI Academic Output (edited to show the “Multidisciplinary” field) (NSSI, 2015)


Figure 2. Volume and quality of academic output for New Zealand between 2010-2014 (Scival, 2015).

Figure 2. Volume and quality of academic output for New Zealand between 2010-2014 (Scival, 2015).


While recreating this graph, we came across the same issue as the NSSI with the data lower down becoming too small to be able to fit any labels to it. Unfortunately, this is just what will happen with this data, as the largest amount of publications is 19,993 for “Medicine” and the smallest amount is 401 for “Dentistry”. There is such a large gap between these that yes, it will be hard to have them both clearly visible and labelled.

A feature of this graph that definitely could have used some explanation would be the mean quality of output of NZ and SAEs. At first glance we thought the two averages were weighted by the quantity of publications in each field, since if we look at the NZ line it did not seem balanced by quality alone. Upon further examination of the graph, we noticed that the thin line towards the bottom of the graph was in fact “Multidisciplinary” (an annoyingly large number yet again). So this would explain why our average seemed larger. The mean lines that we have included on our graph are computed using a publication weighted mean. We are not exactly sure what MBIE did as we are not working with the same data as citations have accumulated over time.

These averages also raised the question of how accurate or even necessary these lines are. The lines shown represent the mean quality of output, but the distribution of citation counts for individual papers do not usually follow a normal distribution. Rather, citations tend to better fit a shifted power law distribution as the majority of publications may receive minimal or no citations, while a few highly cited publications will receive even more since when a publication is referenced often, awareness of it grows. This skewness increases over time meaning these average lines become less accurate. It is also likely that the amount of skewness differs between disciplines. A consequence of the skewed distribution is that the mean becomes a poor measure of the “average” value of the distribution. For heavy-tailed distributions, like a power law, the mean value will tend to be much larger than the median. This means that any statistics that are weighted by the mean will also be skewed. This makes it problematic to compare the average of New Zealand with the average of other small advanced economies, since the Field-Weighted Citation Impact does a poor job of normalising across fields.

Another difficulty in using citations to measure the quality of output is that while papers are usually highly cited when other researchers agree with their statements or use them to support their own research, high rates of citation can also occur for controversial or incorrect statements. Unfortunately, there doesn’t seem to be a more effective way of measuring quality of scientific publications so for now, we are stuck with our flawed metrics. Scientists do seem to love a sneaky approximation here and there.

*Featured image by XKCD comics.