Guest Blog

Inclusive learning environments for both Māori and Pākehā

Inclusive learning environments for both Māori and Pākehā

Improving education for all is as uncontentious as political clauses get here in Aotearoa, championing this issue is on the front burner of nearly every government body, institution, and organisation across the country. Compare this to the endless debate, controversy, resistance, and even hostility which meets attempts to address protocol and resolve for our Māori students.

He waka eke noa and ‘Auckland 2050’

This year’s research project at Te Pūnaha Matatini, ‘He waka eke noa’, is part of an ongoing research partnership with Ngāti Whātua Ōrākei that aims to highlight some features of our education system in Aotearoa—chiefly those affecting Māori—as part of Auckland Council’s greater 30-year vision, ‘Auckland 2050’.

He waka eke noa follows in the wake of last year’s research, ‘Analysis of Well-Being’, which endeavoured to interpret open responses of 684 registered hapū members, taken from surveys encompassing various forms of codified knowledge and kōrero in the hopes of directing the development of a hapū well-being framework. Well-being was measured with regards to life satisfaction, education, housing, proficiency and abilities in te reo and, more generally, te Ao Māori. Last year’s well-being research proposal was awarded Marsden funding through the Ministry of Business, Innovation and Employment’s (MBIE) Te Pūnaha Hihiko: Vision Mātauranga Capability Fund, which invests in ‘the development of skilled people and organisations undertaking, research that supports the themes and outcomes of our Vision Mātauranga policy.’

A major component of our research cannot be revealed here for privacy reasons, but for the purpose of sharing some of the motivations and ways we engaged with our research, we can look at the educational climate of Aotearoa as a whole, indigenous ways of knowing, and the shared stake we have as a community to improve mainstream education for our Māori students.

Using data collected by the Ministry of Education, we constructed the below graphs to show 2017 post-secondary completions by subject. We found that Māori students were either largely underrepresented or overrepresented across major subjects. Further, we saw more spread in subjects taken amongst other ethnicities while Māori students predominantly stayed in familiar domains—especially at earlier levels of study.

However, there seems to be more spread in subjects once Māori students reach a Bachelors level. This may suggest continuation in studies can affect the way in which Māori students engage with subjects- branching out into different fields after acquiring confidence in mainstream education institutions.

Research in keeping with te Ao Māori perspectives 

Owing to our shared identities of being both Māori Pākehā, as well as students- we attribute a lot of strength throughout this project in our ability to appreciate the unique issues and challenges faced by students with shared membership. We also acknowledge the capabilities for dynamism and resilience. From the outset, a major goal was to utilise our respective disciplines for research while keeping the essence of te Ao Māori perspectives alive throughout. Our team engaged in quantitative analysis as well as discourse analysis pulling from a multitude of various texts that laminated on indigenous taxonomy and concepts surrounding Mātauranga Māori.

Mātauranga Māori embodies a complex network of codified systems of knowledge transfer and storage to which universal constructs are framed in both past and present, existing and non-existing. Cognisance and assessment of information and protocol envelops moteatea (chants, poems), whaikorero (oratory, speechmaking), maramataka (calendar), waiata (songs), pepeha (quotations), whakataukī/whakatauki (proverbs), whakapapa (genealogies) and pūrākau (stories)—each with its own categories, style, complex patterns and characteristics’ (Lee J, 2008).

This culturally-embedded system founded on kanohi kitea (face-to-face) interactions between individuals, whānau, hapū, and iwi is an all-encompassing body of knowledge based on evidence, culture, values and worldview. Despite being a rich form of gathering and sharing information, these traditional ways of knowing have often been considered incompatible with local pools of thought and sanctioned as illegitimate in the wider scientific and academic communities.

In rumination of Aotearoa’s current educational climate, it’s hard to imagine that the above mentioned would not play a significant role in how Māori students interact with mainstream constructs of knowledge as well as the capacities of education providers, peers, and the public to appropriately assess and acknowledge learning systems so undivulged in the domains to which they usually operate. We might also consider how the role of whānau can be affected in the enormous task of accessing and providing tools relevant to outside industries of learning and their abilities to advocate on behalf of their tamariki far from the sources of support provided within their everyday communities. As Māori student education figures indicate improvements in academic participation, the alarmingly high SSEE rates (stand-downs, suspensions, exclusions and expulsions)— at nearly all levels of study— only corroborates the narrative that the current education system is failing. The system in place is robbing its students of having an equal chance to prosper in falling short of its most essential obligations– to encourage and inspire student potential. The damages placed on already strained communities to which these students are part of seems to fuel a never-ending cycle that sets up generation after generation with even less opportunities to thrive.

Unfortunately, the displacement of Māori is not consigned to the past. Historical attempts to keep te Ao Māori outside of the local mainstream includes legislation such as the Tohunga Suppression Act, the Education Ordinance Act and the Native Schools Act, and a nationwide ban on te Reo Māori, including ‘A wide range of punishments used against children who speak te Reo at school (including corporal punishment)’.

Amid cultural asphyxiation, Māori risk losing their voice, abilities to navigate on their own terms, and essentially – their mana. Even in moments of advocacy we are limited in our capacities. We look at issues surrounding Māori and the needs for address but from a Māori standpoint, there is a need for redress. An overwhelming loss of trust in the mainstream educational institutions prevails when we miss these opportunities of knowing.


In conclusion, institutions cannot remain mutually exclusive in inclusive learning environments. There is a demand for institutes for Māori and Pākehā across Aotearoa, to foster a respectful, understanding and empathetic community. Culture is a learned system. Access to Māori systems of support in schools will elevate New Zealand children’s comprehension of how culture, language, and heritage empower their own identities. The immersion of these two systems of education will lead to a broader sentiment of cultural heritage will encourage New Zealand’s future communities to practise more tolerance and acceptance of cultural diversity.

He waka eke noa in a literal sense translates to the ‘canoe which we are all in without exception’. For the purposes of this project, we might attribute this whakataukī to a collective consciousness and gentle reminder that, when we are in a waka, there is unity in a shared purpose. Here, we look to education.


Brianne (Bri) Halbert and Megan Liejh are students at the University of Auckland. Bri is pursuing a double major in Computer and Data Science, and Megan is completing a conjoint Law (Hons) and Arts Degree in Political Philosophy Law and Politics. While these disciplines may appear vastly different, they were able to find a lot of overlap and even harmony in their exploration of inclusive education for Māori.


Lee J. 2008, Ako: Pūrākau of Māori teachers’ work in secondary schools. [Unpublished PhD thesis]. Auckland, New Zealand: University of Auckland.

Designing a data management system for archaeological records

Designing a data management system for archaeological records

The Archaeology and Physics Departments at the University of Auckland, as well as contributors from other universities, have been collecting data on obsidian artefacts from the north part of New Zealand. To date, this project has data on over 2,500 such artefacts, obtained from various sources including historical studies done on obsidian to more recent studies done by current archaeologists at the University of Auckland. Part of the aim of this research is to look at “Social Network Analysis of Obsidian Artefacts and Māori Interaction in Northern Aotearoa New Zealand” which is the title of a recent publication which involved my Te Pūnaha Matatini and industry supervisors.

Why study obsidian?

Obsidian is a volcanic glass which is found at several locations in New Zealand. It is hard and brittle such that when a piece is broken off (called a flake), it has sharp edges. This made it very useful as a cutting tool in pre-European New Zealand. By analysing the elemental compounds of the artefacts, it can be determined where each artefact was sourced. By comparing this to which archaeological site each artefact was found at, my supervisor Dr Dion O’Neale has been able to infer social networks of pre-European New Zealand. Dion analysed the geographical least cost paths and found that distance was not always the main factor in determining where each archaeological site sourced its obsidian flakes from. Therefore, by analysing obsidian artefacts, a lot of information can be gained and it is the aim of this research project to be able to infer this type of information and even more regarding pre-European Aotearoa New Zealand.

With so much varied data the need arose to have a central data infrastructure where all the various data records can be stored along with protocols to support data quality and provenance. This data needed to be accessible by various parties from various departments and universities.

The main steps I took to complete my internship project included:

  • Choosing and learning to use an appropriate database software
  • Schema design
  • Data cleaning
  • Scripting for automated data uploading

These steps were not necessarily sequential and often ran in conjunction with each other. For example, since there was a variety of data sources, while I was doing the data cleaning I came across new data fields in which case I had to edit the schema to reflect the new field. However while doing the data cleaning, I often came across discrepancies or unknown variables in the data which I needed to wait to hear back from other people about before I could proceed.

It surprised me how long it took me to design the schema. Data cleaning often takes the longest amount of time. In some sense the data cleaning did take some time because while I was designing the schema, I was also figuring out what data to keep and what not to keep. This greatly reduced the time it took for me to clean and format all the data tables to be ready for upload. After that, finding and learning to use an appropriate database platform also took a while. Finally, writing the scripts for automatic uploading to the database took a couple of weeks.


Kate is currently studying for a Master of Applied Data Science (MADS) at the University of Canterbury. 

Machine learning for te reo Māori

Machine learning for te reo Māori

For 10 weeks over the 2018-19 summer, I was involved in a project with Te Hiku Media and Dragonfly Data Science to aid in the development of a Māori voice assistant. The motivation for this project was to make Te Reo Māori more accessible and fun in the digital age.

During my internship I achieved the creation of a “box” called Rapere (translation of “Raspberry” into Te Reo Māori) containing a Raspberry Pi computer which is connected to the internet, some lights, a speaker and a microphone. This box has been coded to be continuously listening for spoken voice, and when this is detected it records what is being said until there is a longer break in the speaking (this file is overwritten each time a recording is made).

Cherie’s recording equipment set-up.

The recording is transcribed using Te Hiku Media’s Application Program Interface on The transcription that is returned to the box is compared to some key words which mean the speaker is likely asking to hear the news or to listen to the radio, or to stop playing. If these are heard, then the news or radio stream is played or stopped, and otherwise it goes back to listening for these phrases. The phrase “kia ora” lights up an LED for a few seconds. The box is able to listen for commands while playing audio, which allows the user to stop audio playing. The project was documented and all the code uploaded online to allow other developers at Te Hiku Media to progress it further and demonstrate the abilities of the Rapere box.

I experienced a great feeling of accomplishment from my work with Te Hiku Media and Dragonfly Data Science. Going from a bunch of components and an empty raspberry pi computer to having a working program with two different APIs and which plays the news on my correctly saying the appropriate phrase in Te Reo was more than I thought I would be able to achieve and I am proud of what I achieved with the help of my supervisors. I am grateful to Te Pūnaha Matatini for providing me with the opportunity to have this internship.


Cherie Vasta is a recent Statistics & Psychology graduate of Victoria University of Wellington who is starting a new role at an actuarial consulting company in Auckland. Emma enjoys applying data science techniques to all kinds of problems, especially those involving people and the way they think.

Streamlining the process of social network analysis

Streamlining the process of social network analysis

Romalee Amolic talks about her 2018-19 Te Pūnaha Matatini Summer Internship with AgResearch where she worked on a project to enhance social network analyses of biosecurity information in the New Zealand tourism industry, so that such analyses can be conducted faster and more effectively in the future.

Social Network Analysis (SNA) is a powerful data analysis technique which often helps in identifying hidden relationships and other critical information in a communication network.  Data for an SNA can be collected from various sources which may result in extensive pre-processing and cleaning time as compared to the time needed for actual network analysis. Hence, this project aimed to use data carpentry to streamline the use of social research data (e.g. collected through surveys) to be able to conduct social network analyses quicker and more effectively in the future.

Better Border Biosecurity case study

The Better Border Biosecurity project, a multi-partner, cooperative science collaboration which analyses the exchange of biosecurity information in the New Zealand tourism industry, is used as a case study to develop and test methods which streamline the SNA process. The data for this project includes 154 responses from tourism providers across New Zealand who named up to 3 sources from whom they seek or receive biosecurity information. Information about the location and role of the respondents was collected. Some additional questions were also included in the survey such as the form and frequency of communication, the usefulness of information and the trust between parties.

This information was then used to perform Social Network Analysis in Gephi – a powerful interactive social network analysis tool. However, the survey data had to be first converted into a format fit for network analysis. The conventional approach for cleaning the data is discussed below.

Conventional approach

The survey responses were collected in Excel sheets. The data pre-processing and cleaning was done manually using Excel.

Problems with the conventional approach

  • It involved dealing with the data manually which was an extremely time-consuming process and needed about 1-3 weeks depending on the complexity of the data.
  • The process was prone to human errors which reduced the potential of the data.
  • It required skilled labour for an extended amount of time and hence, increased the costs involved.
  • It led to data inconsistencies.

Hence, taking all these problems into consideration, a generic automated process was developed to clean the data as discussed in the following sections.

Data Cleaning and Pre-processing

In a network, each node represents a unique identity. Hence, the most important task in cleaning the data was to recognise and remove inconsistencies in the names which occurred due to the textual nature of responses. The following techniques were used to clean the data in Python:

  • Initially, all the names were made lowercase for the analysis.
  • Special characters were removed.
  • Rows containing missing or no information were removed.
  • Trailing or unnecessary white spaces were removed.
  • Incorrect spellings were identified and removed using a spellchecker. The challenge here was to differentiate between the proper nouns (such as ‘EcoZip’) and dictionary words (such as ‘Adventures’). For example, an entity name ‘EcoZip Adventures’ was misspelled as ‘EcoZip Adventres’. A conventional spell checker would consider ‘EcoZip’ as a spelling error along with ‘Adventres’ as both the words are not found in the dictionary. Hence, a solution was developed to distinguish the proper nouns from actual dictionary words in entity names and correct spelling errors in the data.

  • A custom algorithm was developed to identify abbreviations in the text and replace it with the full name. e.g. ‘DOC’ was identified as ‘Department of Conservation’.
  • Several names which were written similarly but were however, the same entity, were identified and merged. This is the most significant part of the process or the most “satisfying” part, as a user described it. An example is shown below:

  • All the names in the network were also compared pairwise to further remove any inconsistencies and generate a list of consistent and unique names involved in the biosecurity information exchange.

This cleaning process reduced the entities from 319 (including inconsistencies) to 139 consistent and unique entities (nodes) in the network with 335 relationships (edges) between them which were then used to generate visualisations.

Social Network Analysis

Directed maps were generated using Gephi which were then further analysed. An anonymised example of one of the social network maps generated is shown below:

An example of anonymised social network map generated using Gephi.

The information obtained through this network analysis can now be used by biosecurity providers to better target information exchange within the New Zealand tourism industry.


  1. This application significantly reduced the time (from 1-3 weeks to a few hours or minutes) in cleaning and pre-processing the data before analysis.
  2. As a result, the costs involved in the conventional extensive processes, which involved a lot of manual effort, were also reduced.
  3. The new streamlined process almost eliminated the human errors involved in the manual inspection of data.


Hence, through this case study, an application was developed, which streamlines and automates all the steps starting from loading and cleaning the data up to the generation of data sheets to be used in the SNA. Although, this is currently a Python application, the development of a GUI based interactive SNA application design is currently under consideration.

I would like to acknowledge Helen Percy, my industry supervisor and Penny Payne, the social scientist at AgResearch for their invaluable support during this project.


Romalee Amolic is a Master of Applied Data Science student at the University of Canterbury. In February 2018, she completed her summer internship with AgResearch, Hamilton. She thoroughly enjoyed her internship project which involved streamlining and increasing the efficiency of the data cleaning and network map generation processes at AgResearch. She is passionate about harnessing the power of data analytics to improve the lives of people. She eagerly looks forward to applying the skills learnt, in fulfilling her aspiration of becoming a data scientist.


Romalee (centre) with Helen Percy, her industry supervisor (right), and Penny Payne, the social scientist at AgResearch (left).

Māori and Pacific Island women in science

Māori and Pacific Island women in science

Before I started working as a research assistant on the Hidden Networks project, the only woman from the history of New Zealand science I could name was Joan Wiffen, the “dinosaur lady” who discovered New Zealand’s first dinosaur fossils in Hawke’s Bay. She was a remarkable woman who contributed much to palaeontology here in New Zealand; she was also, incidentally, very white. I too am outwardly (that is, I pass as) very white. But as a mixed-race woman of Samoan descent, when I started this project I was very interested to learn about the contributions of non-Pākehā – chiefly, Māori and Pacific Island – women to science in Aotearoa. For the purposes of my research, I’ve taken “woman in science” to broadly mean a woman who has made a contribution to science in New Zealand, including both professional scientists with academic backgrounds and amateur scientists who have added to the pool of knowledge in their field, like Joan Wiffen.

The more I researched, the whiter the history of women in science in New Zealand came to look. Unsurprising really: according to Elizabeth McKinley, in 1998 just 1.5% of total employees at seven Crown Research Institutes in New Zealand identified as Māori women; there were none in management positions, and only two scientists. In ‘Finding Matilda’, Kate Hannah notes that “the historiography of science in New Zealand … tends to inadvertently reinforce [the] camouflage” of women. They are marginalized, but not absent: if you go looking, as I have, you’ll find a staggering number of women in New Zealand science from the 14th century to present-day. Yet from the beginnings of European presence in New Zealand, the overwhelming majority of these women were white. A feminist revisionist history of science aims not only to make science less male-centric (i.e. demonstrate, through promotion of women’s work both quantitatively and qualitatively, that science never has been just a man’s world) but also to make it less monochromatic (so to speak), which means celebrating the scientific achievements of brown women in New Zealand’s history, and showing that science never has been just a white world either.

In fact, the first women who made scientific contributions in Aotearoa were not Pākehā but Māori. I was delighted to learn of Whakaotirangi, who in the 1300s “was responsible for safeguarding the seed of the kūmara” as the Tainui Waka journeyed to Waikato. She was the wife of Hoturoa, the leader of the Tainui Waka migration from Hawaiki to Aotearoa, but also an important historical figure in her own right. In ‘Whakaotirangi: A Canoe Tradition’, Diane Gordon-Burns and Rāwiri Taonui explore how her importance has been diminished in post-European contact accounts of the Tainui migration. Tainui and Te Arawa traditions both speak of Whakaotirangi: she appears to be a noble and important ancestor in the history of both iwi. While she is most remembered for bringing kūmara to Waikato, she was also responsible for a number of other plants brought from Hawaiki. On arrival in Waikato, Whakaotirangi built gardens in which she experimented with growing and tending to a variety of plants, both for sustenance and medicinal purposes. She discovered how to make the kūmara, which had come from a much warmer climate, grow in the cooler land her people had settled. Her work was crucial for the establishment of the Tainui people: it provided them with a reliable food supply as they adjusted to life in a new land. She was also involved in commissioning, building and launching the Tainui canoe. Her profile on the Royal Society of New Zealand website, as part of their series 150 Women in 150 Words, credits her as “one of New Zealand’s first scientists”.

Around the middle of the 1400s, another important ancestor of the Waikato people appeared. Kahu (also known as Kahupeka, Kahupekapeka, Kahukeke, or Kahurere) was a Tainui woman who experimented with plants – such as harakeke, koromiko, kawakawa and rangiora – as medicinal remedies. She did so during her great journey: walking inland through the King Country while grieving the death of her husband (who in some accounts is Rakataura; in others Uenga). She gave names to different sites along her journey (such as Te Manga-Wāero-o-Te Aroaro-ō-Kahu – ‘the stream in which Kahu’s dogskin cloak was washed’) – these names tell the story of her journey and preserve the history of the land. At some point during her journey she was ill, which may have been why she sought out plants for their medicinal properties. Unfortunately there are many different versions of Kahupeka’s story, and in them there are few mentions of her medicinal experimentations with indigenous flora. In some versions Rakataura doesn’t die, and he and Kahu traverse the countryside naming places together, as explorers.

In Māori culture, practitioners or experts in any skill or art are known as tohunga. The Tohunga Suppression Act 1907 made tohunga status a punishable offence. The Act was repealed only in 1962, and so much of the knowledge surrounding this customary way of knowing has been suppressed – my search for tohunga wahine (female practitioners) who might count as women of science has not produced significant results. However, it is worth noting that the sources I accessed relied upon the written record. Other sources, such as Māori oral histories, may be much more fruitful.

The next Māori woman in science that I was able to find wasn’t born until the 19th century. Makereti Papakura (Margaret Pattison Thom; she also went by Maggie and was of Te Arawa and Tuhourangi iwi) was born to a Māori mother and an English father in the Bay of Plenty in 1873. She was raised by her mother’s aunt and uncle in Parekarangi, a rural area. She didn’t learn English until she was ten years old, speaking only Māori until her father took over her education. After her schooling, Papakura moved to Whakarewarewa, where she became an accomplished tourist guide. She gave herself the surname Papakura after a nearby geyser when a tourist she was guiding asked if she had a Māori surname. Clearly, the name stuck. In 1891 she married surveyor Francis Joseph Dennan; they had one child together before divorcing in 1900. In 1905 she wrote Guide to the hot lakes district. Papakura travelled to England in 1912, and married Richard Charles Staples-Browne. She had first met Staples-Brown when he was on a tour of New Zealand, and had reconnected with him while she was part of a Māori tour party in England. They divorced in 1924, but Papakura remained in England and in 1926 she enrolled at Oxford University, studying a BSc in anthropology. She died on April 16, 1930, only two weeks before her thesis, The old-time Māori – in which Papakura combined customary knowledge with scholarly conventions – was due to be examined. It was published posthumously, eight years later. Her thesis covers Māori social and familial structures, housing, weaponry and relationship with fire. She was meticulous in her writing, and wrote letters to her people in New Zealand during her drafting process, to ensure her account was as accurate as possible.

Bessie Te Wenerau Grace (1889-1944; Ngāti Tūwharetoa) was the first Māori woman university graduate, graduating from Canterbury University with a BA in 1926. She was the granddaughter of Ngāti Tūwharetoa chief Horonuku Te Heuheu. She then went on to receive an MA with first-class honours in modern languages from London University. In London she also became a nun, Sister Eudora. She worked as headmistress of St Michael’s School in Melbourne. In 1945, Dame Mira Szászy (1921-2001; Ngāti Kurī, Te Rarawa, and Te Aupōuri), a prominent Māori leader, became the first Māori woman to graduate with a degree from the University of Auckland. She went on to complete a postgraduate diploma in social sciences from the University of Hawaii and worked hard to improve the welfare of Māori women throughout her life. In 1949, Rina Winifred Moore (1923-1975; Ngati Kahungunu, Rangitane and Te Whanau-a-Apanui) graduated from the University of Otago with a Bachelor of Medicine and Bachelor of Surgery – and in so doing, became the first Māori woman doctor in New Zealand. In her career she worked to improve public perceptions of the mentally ill and was one of the first doctors in New Zealand to prescribe the contraceptive pill.

It has been harder for Māori and Pacific Islanders to enter scientific professions, as they are forced to combat social prejudices that expect them to fail – that tell them this is not where they belong. It has been harder for women to enter scientific professions because, again, they have to fight against the social biases that tell them ‘this is not your world’. Until the late 20th century, many women were expected to give up their careers when they married – motherhood and the domestic sphere became their full-time responsibilities. Some women chose to remain unmarried and childless in pursuit of scientific careers, while others stopped working when they married. Māori and Pacific women have to fight both gender and racial biases for their place in the world of science. This has been the case throughout the post-contact history of Aotearoa, and continues to be so.

Dr Ocean Mercier. Image courtesy of Dr Mercier and Image Services, Victoria University of Wellington.

Today, there are increasing numbers of Māori and Pacific Island women in science, with some of them working at the intersection of traditional knowledge and western science. Dr Ocean Mercier (Ngāti Porou) is a Senior Lecturer in Māori Science (the intersection of western science and mātauranga Māori) at Victoria University of Wellington. She has a PhD in Physics and was awarded the New Zealand Association of Scientists (NZAS) inaugural Lucy Cranwell Medal (previously the Science Communicators’ Medal) in 2017. Science researcher Hokimate Harwood (Ngāpuhi) combines western scientific and Māori customary knowledge in her research of the feathers in kahu huruhuru (feather cloaks). Her use of microscopy to identify the origins of feathers used in precious cloaks has been pioneering. She is a Bicultural Science Researcher at Te Papa. Her sister, Dr Matire Harwood (Ngāpuhi; PhD MBChB), is a Senior Lecturer at the University of Auckland Medical School and has done crucial research into indigenous healthcare throughout her career. Her efforts have been widely recognised, and in 2017 she was awarded a fellowship to the L’Oréal UNESCO For Women in Science programme.

Dr Hiria McRae. Image courtesy of Dr McRae and Image Services, Victoria University of Wellington.

Victoria University science educator Dr Hiria McRae (Te Arawa, Tūhoe, Ngāti Kahungunu) has created and developed a new educational model aimed at raising Māori students’ engagement in high schools. Through her research projects she has made important contributions to the field of Māori education.

Dr Pauline Harris. Image courtesy of Dr Harris and Image Services, Victoria University of Wellington.

Victoria University astrophysicist, science lecturer and research fellow Dr Pauline Harris (Rongomaiwahine and Ngāti Kahungunu), who has a PhD in astroparticle physics, is a key figure in the revitalisation and teaching of Māori astronomy. She is also involved in the search for extra-solar planets. Connected to Harris’s Māori astronomy programme is Pounamu Tipiwai Chambers, an undergraduate student at Victoria University who has employed Māori astronomical and navigational knowledge in undertaking waka voyages across the Pacific.

Another remarkable young woman, Alexia Hilbertidou (of Greek and Samoan descent), has founded GirlBoss New Zealand, an organisation aimed at the empowerment of young women in STEM studies after she felt alienated as the only girl in her year thirteen physics for engineering class. She was also part of NASA’s SOFIA project, making her the youngest person ever to be part of a NASA mission.

My blog post aims to contribute towards the unmasking of Māori and Pacific women’s contributions to science in both historical and contemporary landscapes. We are already seeing some important changes: many Māori women in science today combine customary and scientific knowledge to great success, a road paved by Makereti Papakura and her BSc thesis. However, Māori and Pacific women are still dramatically under-represented in fields of science, particularly at senior and management levels. It is therefore important that we keep up the momentum of positive change not only by looking forward but also by looking back: the successes of past figures provide an encouraging bevy of ‘shoulders to stand on’ for women in science today.

This post was written as part of my summer scholarship research on the Hidden Networks project, supervised by Rebecca Priestley and Kate Hannah.

Further reading

If you’re interested in learning more about the women I’ve mentioned, you might enjoy some of these sources:


Beth Rust is a BA(Hons) history graduate from Victoria University of Wellington. For her Honours thesis she researched the writings of Christine de Pizan, a 15th-century humanist and early defender of womankind. This past three months she has been working as a research assistant on the project ‘Hidden Networks: hybrid approaches for the history of science’. Beth is just about to start a job in the public service, and she is very excited to take the skills she has learned from her summer research into her new role. She loved being a summer scholar.

How machine learning can perpetuate racism

How machine learning can perpetuate racism

I wrote this algorithm to classify people by gender, but one of the biggest things I learned was how machine learning can reinforce racism and perform poorly on ethnic minorities.

Machine learning – or programs that are able to learn from and improve on past experience and data – is often accused of reinforcing human biases such as racism and sexism. However, it can be a bit unclear how exactly this happens.

How does an automatic soap dispenser fail to recognize black people’s hands? How does image recognition software come to classify people in kitchens as women, regardless of their actual gender? How does artificial intelligence that seeks to predict criminal recidivism produce results that are consistently biased against black people?

This walk-through hopes to give you a bit of an insight into one example of racism in machine learning, and how this comes to be.

The algorithm will be used as part of research into gender equity in STEM fields in New Zealand. A lot of information about who works in certain research centres or who graduated from university is publicly available online (for example, here are university records from NZ between 1870 and 1961), but it doesn’t explicitly include their gender. While a person reading the information can usually guess their gender quite easily and with a high degree of accuracy, it’s obviously very impractical to read and classify thousands or hundreds of thousands of observations. This is where this algorithm hopes to simplify and speed up the process of identifying women in STEM fields.

Training and testing data: Selecting appropriate data

Getting good data for the training and test sets is a really important part of machine learning. Your model is only as good as the data you train and test it on, so getting this right is key.

The starting point of my dataset is the 100 most common names for boys and girls born in New Zealand in each year, going back to 1954. One major drawback of this dataset is that it only includes people born in New Zealand, not those that emigrated here. This means the dataset is almost exclusively made up of Anglo-Saxon names, and does not reflect New Zealand’s large Asian and Pacific populations.

It also doesn’t include any Māori names, presumably because the Māori population isn’t large enough for these names to make the top 100 list. I’ve tried to remedy this by adding the top 20 Māori names for boys and girls from several years to the dataset. However, 91% of the training dataset is still made up of Anglo-Saxon names, while only 9% is made up of Māori names.

These biases in the training dataset mean that the model is likely to recognize the patterns that indicate gender in Anglo-Saxon names, while not picking up on patterns that indicate gender in the names of other cultures. The same biases in the testing dataset mean that the accuracy of the model probably only applies to Anglo-Saxon names, and that it may do much worse on names of other nationalities.  

Selecting useful features for the algorithm

It’s important to consider what features would be most useful in predicting the desired classes. I started off by using the last letter of each name to predict gender. Most Anglo-Saxon names for men end with a consonant, while most Anglo-Saxon names for women end with a vowel.

There are also some pairs of letters that are more common for one gender than the other. For example, the last letter ‘n’ is indicative of a male name (e.g. Brian, Aidan, John), but the suffix ‘yn’ is indicative of a female name (eg. Robyn, Jasmyn). Because of this, using both the last letter of each name and the suffix as features results in higher accuracy than just using the final letter. This gave me an accuracy of about 73% on a testing dataset that includes both Anglo-Saxon and Māori names.

This overall accuracy is lower than it would have been on a testing dataset made up of only Anglo-Saxon names because these features don’t perform as well with names of other origins. In a New Zealand context, this causes the most problems with Māori names. Most Māori names end in vowels, regardless of gender (examples of male Māori names include Tane and Nikau, while female Māori names include Aroha and Kaia). This means this particular feature doesn’t do a very good job with names of Māori origin.

The same problem would likely apply to other ethnicities, too. For example, Japanese, Chinese, Vietnamese, Italian and Hispanic names all often end in vowels, regardless of gender.

Imbalanced classes and the problems they cause

Imbalanced classes, or classes that are very different in their size, can also create problems for machine learning algorithms. In this case, ethnicity is an imbalanced class that is likely to influence people’s names. In the 2013 census, 74% of New Zealanders identified as European, 15% as Māori, 12% as Asian and 7% as Pacific. (Note that Statistics New Zealand allows you to identify with more than one ethnicity, therefore these numbers don’t add up to 100%).

Imbalanced classes often result in high accuracy within the majority class (in this case, European) and low accuracy within the minority classes (Māori, Asian and Pacific). This particular algorithm has an overall accuracy of about 73%. The accuracy within Māori names is about 69%, while the accuracy within European names is 75%.

The class imbalances in the data explain why the overall accuracy may not be a very good way of assessing whether the algorithm is working well. As well as checking the accuracy within each subgroup, it can be a good idea to look at precision and recall for more information on where the algorithm is doing well and where it’s doing poorly.

Precision tells us how much of a classified group actually belongs to that group. In this case, for example, precision of female names is the percentage of names classified as female that are actually female. It is calculated by dividing the number of true positive (number of women classified as female) by all positives (number of women and men classified as female).

Recall is the percentage of a particular group that has been classified as belonging to that group. For example, recall of male names is the percentage of male names that have been classified as male. Recall is calculated by dividing the number of true positives (number of men classified as male) by the number of true positives and false negatives (number of men classified as female).

The tables below show the precision, recall and a couple of other metrics on how well the algorithm is doing. The differences between the overall table and the tables by ethnicity show that it’s likely that this algorithm is systematically worse with non Anglo-Saxon names, specifically Māori names in this instance.


precision recall F1 score support
F 0.77 0.76 0.77 274
M 0.71 0.72 0.72 226
avg/total 0.74 0.74 0.4 500

For Māori names only:

precision recall F1 score support
F 0.75 0.88 0.81 17
M 0.33 0.17 0.22 6
avg/total 0.64 0.70 0.66 23

Here we can see that both precision and recall is very low for male Māori names. This means that only a small percentage of the names classified as being male actually are male (low precision) and an even smaller percentage of male Māori names have been classified as being male (low recall).

This is probably because most Māori names end in vowels, regardless of their gender. The algorithm does alright on female Māori names, because it has seen many instances of female names ending in vowels before. But it hasn’t seen many male names ending in vowels, so it fails to classify most of these names correctly.

For European names only:

precision recall F1 score support
F 0.82 0.72 0.77 140
M 0.7 0.81 0.75 115
avg/total 0.77 0.77 0.77 255

Because machine learning algorithms with imbalanced classes usually do worse in the smaller classes, they can further marginalise minority groups by routinely misclassifying them or failing to take into account patterns that are unique to the smaller group. In this example, this is likely to be the case with ethnic minorities.

It seems that this algorithm is likely to really only do a good job on Anglo-Saxon names. This limits the situations in which it would be appropriate to use it, and risks reinforcing Eurocentricity and a focus on whiteness.

This example shows how difficulties in getting hold of representative datasets, selecting features and unbalanced classes can cause algorithms to perform poorly on minority groups. These are only a couple of the many ways machine learning can contribute to the marginalisation of minorities, and it’s important to consider how this might happen in the particular algorithm you’re working on.

The consequences of bias in machine learning can range from the irritation of not being able to get soap out of an automatic dispenser, to the devastation of being given a longer prison sentence. As these algorithms become more and more ubiquitous, it is essential that we consider these consequences in the design and application of machine learning.

See this paper for a more detailed look at how imbalanced classes affect machine learning algorithms.


Emma Vitz is a recent Statistics & Psychology graduate of Victoria University who is starting a new role at an actuarial consulting company in Auckland. Emma enjoys applying data science techniques to all kinds of problems, especially those involving people and the way they think.

Bad Science and Beyond

Bad Science and Beyond

“I just don’t care!” the doctor said, in response to a query on a personal health-related issue during a recent New Zealand television interview.

I certainly had never heard a doctor speak in this way – and it made me curious to hear more in his live talk.

I was not disappointed. The thought-provoking comments kept coming: “The right way to build public trust is to earn public trust, and to share data” Ben Goldacre adand, “The paradigm of medicine has somewhat shifted”.

In his unique and energetic style, Dr. Ben Goldacre got his message across to the audience at the Mercury Theatre in Auckland this September. A British physician, researcher, columnist, and author, Dr. Goldacre has made it his mission to tackle “bad science”, whether it is used by drug companies, politicians, journalists, or researchers.

Dr. Goldacre explained the misrepresentation of the research life cycle (objective – data collection – data analysis – publication – evaluation), showing the audience the easiest way to mispresent science and the shortcomings of medicine.

Using statistical data taken from newspapers, advertisements, and the research reports, the audience learned how so-called “in-depth scientific research” can be used as a clever marketing tool and how defined research objectives can often by driven by profit.

Ben Goldacre - Bad Science

Photo credit: Dr Michelle Dickinson

One telling example of misrepresented data is the sudden drop in the number of tonsillectomies carried out on children in Hornsey, North London. There was a big discrepancy before and after 1929 – that is, from a few hundred cases down to almost nothing. After some investigation, it was revealed that the decline in the number of tonsil operations coincided with the retirement of one individual medical officer at a particular school, replaced by someone with a different opinion as to the merits of the treatment. Such a case reveals the power of doctors’ choices, rather than patients’ needs. It also shows that how unreliable the data will be if the full picture is not disclosed.

Sharing a more recent case, Dr. Goldacre explained the use of statins, a medicine to lower cholesterol in the blood. Many treatment options are available to lower cholesterol against a placebo, but these have not been tested against one another to determine “real world effects”, including death. Dr. Goldacre and his team approached the UK National Health Service (NHS) to collect patient information. If patients agreed, doctors would be able to randomize the treatment options, ultimately finding the optimal treatment for considerably less resource compared to the traditional “door knocking” data collection method. Nevertheless, this was thwarted by opposition from ethics groups arguing that patients should have a choice.

Dr. Goldacre’s presentation led me to wonder; what role does the general public play in formulating bad science? Surely it is not just the domain of motivated organisations, unreliable researchers, and sensationalist media. Bad science can, and often is, disseminated by all walks of life.

How can we stop its spread? Transparency may be the answer – including that of research objectives, processes, and publication. If research objectives are set for the benefit of all, or purely for the improvement of a company’s bottom line, the public should know – and in a language they can understand. Let consumers make up their own minds.

Silencing Science - Shaun HendyUnfortunately, providing scientific data to the wider public and expecting people to reach their own conclusions may not be sufficient. Effective communication is also a critical element in combatting bad science. Nowhere is this more eloquently stated than in Professor Shaun Hendy’s timely book, Silencing Science: “The job of the scientist is not just to deliver the facts, but also to engage democratically to assist the community to weigh the full breadth of evidence” (p96).

Indeed, scientists and research providers should communicate well, working together in the best interests of the public.

As a consumer and a citizen in a world of information overload, it can be easy to be misled by a well-packaged data snapshot. We need to embrace a reliable and complete picture, and in terms we can understand. This will allow us to make our own choices in areas as broad as health, education, career, life-style, and more.

As a student of science and maths, I am beginning to grasp the moral imperative of the scientific community. Scientists should not only be answerable to their fund providers, but to everyone. A good start would be the publication of research findings representing the whole truth.


Tristan Pang

14-year-old Tristan Pang is a maths and physics major at the University of Auckland. He is also the creator of Tristan’s Learning Hub, producer and broadcaster of Youth Voices, founder and webmaster of several community websites, frequent speaker at schools, organizations and conferences, and tutors students from primary school level through to university. He aspires to make a difference in the world.

A more social network

A more social network

In the immortal words of Vanilla Ice – Stop, collaborate and listen. Collaboration is a cornerstone of modern science and with flight tickets cheaper than ever before and the internet effectively eliminating the expense of correspondence, academics and researchers are looking further afield and reaching more contemporaries across the globe. However, different institutions have different facilities and research focuses, not everyone speaks the same language, and so perhaps these researchers may be picky when it comes to who they work with. It raises the question of whether they do have a preference in collaborator based on affiliation and, if so, can this preference be measured and distilled into cold, hard data?

Of course they do, and of course it can be. More to the point, why?

Arguably the most tangible and conveniently quantifiable means in which academic collaboration manifests is in scientific papers and articles, typically with several authors from varying affiliations. A notable drawback in previous studies on research collaboration is that the measures used (such as the fractional count detailed in Nature Index) consider results for each institution, rather than individual academic, and disregard the size of each institution; as a result, smaller and younger institutions may stack up unfavourably compared to those that are more established and larger. For example, take a look at how the eight New Zealand universities compare against each other:

  • The nodes representing each university are weighted by their respective output (total number of co-authored papers by academics affiliated with these universities).
  • The links connecting universities to each other are weighted by the number of papers co-authored by researchers from both institutions.
  • The higher the link weight, the more that the connected universities are attracted to each other.

The skewing effect that university size has on this network is pretty apparent from how Lincoln University has much fewer co-authorships with Victoria University and University of Waikato than with the rest of the network, given its relatively small output. Also of note is that the University of Auckland and AUT have a much lower link weight than one would expect for two universities across the street from each other, yet the University of Auckland and the University of Canterbury have a much stronger link despite being at opposite ends of the country.

First, to address the effect of institution output. We do this using something we call the revealed comparative preference (RCP) of an institution i for collaborating with institution j:

Formula: revealed comparative preference (RCP) of an institution

where Xij is the number of co-authorships between i and j, Xi is the total number of papers co-authored by i with other institutions in the data set, and X is the total number of co-authorships between all the institutions in the data set.

Plainly speaking, it’s a measure of whether two institutions are doing more than collaborating than we might expect with each other relative to their tendency to collaborate with the other universities in the data set. If  Pij > 1  , then universities i and j share more co-authorships than we expect relative to the other institutions in the data set, so we say they have a comparative preference for collaborating with each other. Conversely, Pij < 1  indicates that the two universities are doing less than we might expect.

Anyway. Here’s the NZ university network revised with the links now weighted by their corresponding RCP values:

Better. Here it’s apparent that AUT has a stronger link with Auckland Uni in addition to Lincoln and Waikato, and it should be pointed out that University of Auckland, AUT and Massey University are also closer to each other in the network, bearing in mind that all three have campuses within Auckland.

Now with a working measure, we move on to a larger sample. Bring on the Australians.

Clearly the Tasman Sea has a solid effect on the way New Zealand based researchers connect with those based in Australia; the links within the NZ cluster of universities have greater RCP weightings than those within the Australian cluster, implying a preference for domestic rather than trans-Tasman co-operation. Another feature to consider is that the Australian universities in the same states are grouped together, which is consistent with the idea that geographical proximity plays a significant part in a researcher’s choice of collaborator.

It would only be natural to wonder how academics interact on a global scale – do we ever grow out of talking almost exclusively to our friends and shun outsiders in some weird, grown up, Mean Girls-esque collection of cliques?

From observing how the Dutch and German institutions are grouped together, we might conclude that the language barrier is a large hurdle to overcome when jointly writing scientific literature – this also seems apparent from the Chinese-Hong Kong cluster, as well as Korean and Japanese institutions as well. But languages also tend to cluster geographically, so it is hard to disentangle the effect of language from distance.

It’s no question that with the constant progress of technology, connecting with people is becoming less costly. However, there are factors remaining that impede the prospect of a totally connected scientific community, some of which have been speculated on here. Of course pictures and hand waving don’t constitute a solid argument, but a thorough analysis of these factors and their effect on university collaboration will be in store for you, dear reader.

In the meantime, perhaps one should learn German, or Mandarin, or Dutch, or even Japanese. It’s not that hard.

About the data visualisations
In order to make the larger graphs efficient enough to be used in browser, the amount of connections a node could have to other nodes was limited to its top four RCP values. This change had no significant effect on the clustering observed when the full connection matrix was used. The change was only implemented for the QS, ANZAC and benchmark data sets.


Bonnie Yu is a research assistant at Te Pūnaha Matatini and a member of Te Pūnaha Matatini’s Whānau group for emerging scientists. Her research projects focus on university collaboration networks.

The data visualisations of this post were prepared by fellow research assistant, Nickolas Morton.

University of Auckland Equity reporting – also not such a silly idea?

University of Auckland Equity reporting – also not such a silly idea?

By Caleb Gemmell and Catherine Webb

The Equity Office – Te Ara Tautika – at the University of Auckland has published an equity profile of the university for 2014. The purpose of these yearly publications is admirable – to transparently report how the university is progressing towards its goals of ethnic, gender and disability equity. Upon reading it, however, we were puzzled by the presentation of some of the data. So, in the helpful spirit of our previous Not Such a Silly Idea blog series, we started doing a little deconstruction.

Page 47 is about women in the ‘academic pipeline for advancement to senior positions. It shows the presence of women in the university from junior positions (undergraduate students) through to senior positions (professors).

1There are several issues with this graph. First of all, can you tell what the Y axis is trying to measure? The numbers are too small to be a sheer headcount. A percentage, then. Eventually, we worked out that this is the percentage made up by women of the total FTE (full-time equivalent). Please label your axes.

Secondly, this graph looks deceptively like a time-series. It gives the impression that we are following a cohort of women over time through their academic careers (which would actually be very useful, by the way). However, the data is simply a snapshot in time across many categories. Truly, there is no reason for this data to be presented in a line graph; this is an ordinal scale, and the categories are discreet. It would be better represented by something like a bar chart.

The report’s use of the term ‘academic pipeline’ to describe the chart suggests that the report’s authors are attempting to show the retention of women as they progress through university careers, to gain an idea of where and with what magnitude women drop out of academia. Unfortunately, the overall impression we get from this graph – that women’s university careers tend to wane above Assistant Lecturer[1] until an uptick in the Senior Leadership Team – is not a valid conclusion to draw from a snapshot of data that compares the results of non-contemporary cohorts.

Another reason the pipeline analogy is inappropriate here is that it implies it is normal to progress from student to staff, and from teaching staff to management. This is not the norm. It is fairly unusual to enter the workforce as a teaching academic with little prior experience. The inclusion of the Senior Leadership Team on the end of this progression is especially strange, since many of those positions are not academic at all – for instance, directors of HR, library, or finance – and certainly don’t require you to be a professor first!

So, we tried to present the Equity Office’s information in a more useful way.


This pulls out the clearest information the data has to offer, showing how the gender composition of each academic position has changed since 2000. Whereas the line graph appeared to prompt lateral comparison, this graph highlights the change over time within each category. It cannot be called a pipeline, but it is useful!

You can see that there is now a more granular breakdown, including HoDs and deans. We did this to be more informative and to smooth the sudden jolt between professors and senior leadership. Something we noticed, apart from the greatly improved gender ratio in almost every category in 2014, was the huge change in the ‘Heads of Departments and Schools’ category. We realised that the Faculty of Arts and the Faculty of Medical and Health Sciences (FMHS) were the only faculties which included both departments and schools, and that the schools were created additionally after 2000. That meant a greater number of heads in total, but not necessarily a greater proportion of women.

[1] The use of the category Assistant Lecturer, is its self curious, since no such employment position exists at the University of Auckland. The report’s authors say that the Assistant Lecturer category “…includes GTAs [graduate teaching assistants] tutors etc.” However, GTAs are typically postgraduate students, so would be represented twice in this data. The position of Tutor at UoA has not been in current use for several years, having been replaced by the position of Professional Teaching Fellow (PTF), which is presumably included in the etc. Missing from data are post-doctoral fellows, research fellows and senior research fellows. These positions often make up a sizable fract of many departments, the latter two positions being comparable in seniority to the positions of Lecturer and Senior Lecturer. The data is further confused by the fact that the mid-ranges of the pipeline might be better thought to consist of three parallel streams: PTFs (responisible for teaching only), Research Fellows (typically engaged in research, but not teaching), and Lecturers (engaged in both research and teaching).

Meet the team: Q&A with Stephen Marsland

Meet the team: Q&A with Stephen Marsland

Meet Stephen Marsland – a professor of scientific computing in the computer science cluster of the School of Engineering and Advanced Technology (SEAT) at Massey University. Stephen is also Te Pūnaha Matatini’s new Theme Leader: Complex Data Analytics. “Data is cool at the moment… but it would be nice to see people using it well and understanding what they can and can’t infer from analysis,” Stephen says. Find out more about Stephen’s research and what he hopes to achieve in his new role in the below Q&A.

Tell us about your research, including projects aligned with Te Pūnaha Matatini

My first area of research is the mathematics of shape analysis. This is primarily concerned with geodesics on the diffeomorphism group, which is a mathematical way of describing how flows of smooth, invertible transformations can deform one shape into another in the shortest possible way. I also study invariants to the actions of the groups that can deform images.

More related to Te Pūnaha Matatini is my work in machine learning, which has two parts at the moment: I’m thinking about manifold learning, where we try to find low-dimensional representations of high-dimensional data, and I’m also thinking about dealing with learning about multiple sources of data where all that you see is the combination of the sources. The first is a popular question, but I’m thinking about it very much from the point of view of differential geometry, and how that can help. I’ve got multiple projects going on there with collaborators in England and China.

The second project is with Marcus Frean, another Te Pūnaha Matatini principal investigator. So for example, you might see images of an object on different backgrounds, and you want to work out that the object and the background are different pieces of information.

I’ve got a very big project called AviaNZ going on that combines the shape analysis and machine learning, which is looking at birdsong recognition, in the hope that we can develop algorithms that will recognise birds from their calls and then infer the number of birds from how they are calling.

Finally, I’m interested in complex systems in their classic sense, both complex networks (which are networks with properties such as scale-freeness, or that are small worlds) and also systems where the interactions between agents cause the emergence of high-level properties. I’ve got a variety of projects with students looking at this, including in health, marriage systems, and soon, the evolution of barter (this last one will be funded by Te Pūnaha Matatini).

What attracted you to the role of Theme leader: Complex Data Analytics?

Data analytics underlies everything that we are trying to do in Te Punaha Matatini, but it isn’t really getting the recognition as a subject in its own right. I’m hoping that by exploring more of the links with the other themes I can make people more aware of how much data analytics there is going on, and what tools are available.

How can complex data analytics benefit New Zealand?

Data is cool at the moment (big data is mentioned everywhere) but it would be nice to see people using it well and understanding what they can and can’t infer from analysis. We collect data everywhere on everything, but lots of it doesn’t actually get used for much. For example, there are thousands of automatic recorders around New Zealand recording birdsong. But unless you have tools to analyse the data, you’ve just got a lot of memory used up storing sound that nobody will ever pay any attention to. Turning data into information isn’t easy, but it has to be done, and done well, to make the collection of the data worthwhile.