What is Big Data?


Today, almost any interaction made over the internet or through the consumption of goods and services is being tracked, stored, and used in targeted ways. This has led to the notion of big data -- massive amounts of data that reflect the behavior and actions of various populations. Data scientists and data collection platforms are now able to computationally organize petabytes and exabytes of data so that it is easy to analyze and identify patterns that may have otherwise gone undetected. With the complexity surrounding such large, diverse sets of data, displaying the information is crucial to its success. Visual data analysis blends highly advanced computational methods with sophisticated graphics engines to illuminate patterns and structure even the most complex visual presentations. Information visualization uses infographics, the graphical representation of technical data designed to be quickly and easily understood. In education, data mining is already underway to target at-risk students, personalize learning, and create flexible pathways to success. As education institutions become more adept at working with and interpreting big data, they can make more informed decisions that reflect real learner needs.

INSTRUCTIONS: Enter your responses to the questions below. This is most easily done by moving your cursor to the end of the last item and pressing RETURN to create a new bullet point. Please include URLs whenever you can (full URLs will automatically be turned into hyperlinks; please type them out rather than using the linking tools in the toolbar).

Please "sign" your contributions by marking with the code of 4 tildes (~) in a row so that we can follow up with you if we need additional information or leads to examples- this produces a signature when the page is updated, like this: - gordon gordon Sep 22, 2016

(1) How might this technology be relevant to academic and research libraries?

  • - wlougee wlougee Oct 25, 2016 Libraries, particularly research libraries, are playing increasingly active role in the data management/curation process. This spans everything from consultation for researchers developing (often required) data management plan, to helping organize/manage the data, to repositories for data, to dissemination. - mylee.joseph mylee.joseph Nov 8, 2016 - vacekrae vacekrae Nov 12, 2016- Sandy.hirsh Sandy.hirsh Nov 13, 2016
  • Data is increasingly treated as another collection by libraries. It is submitted with ETDs, deposited into institutional repositories, part of reference services, and treated as publications with citations and DOIs. Understanding how data is created, used, and transformed is a part of the research or scholarship lifecycle, which libraries have always been a partner with. - shorisyl shorisyl Oct 27, 2016- Sandy.hirsh Sandy.hirsh Nov 13, 2016
  • Big data offers completely new ways of retrieval and cataloguing of data. There will probably be completely new ways of providing access to library data via big data technologies. However, I doubt that the term "big data" applies to the data in libraries since the amount of data is comparably small. - franziska.regner franziska.regner Nov 1, 2016 - andreas.kirstein andreas.kirstein Nov 11, 2016 Big data refers to large volumes of data and complex data. Libraries have a lot of complex data. - Laurents.Sesink Laurents.Sesink Nov 12, 2016 Some research libraries do produce lots of big data. - vacekrae vacekrae Nov 12, 2016
  • I agree with the reasons listed above. - dianeb dianeb Nov 6, 2016
  • In addition to being an important new area for collecting, the emergence of data collections offers new opportunities, and in fact a need, for new discovery tools. - mcalter mcalter Nov 6, 2016 - mstephens7 mstephens7 Nov 10, 2016 - vacekrae vacekrae Nov 12, 2016- Sandy.hirsh Sandy.hirsh Nov 13, 2016
  • Quantitative literacy is a skill that can be as important for navigating the world as information literacy but has been neglected by libraries. Libraries can take a lead role in helping integrate data skills and quantitative thinking into the curriculum.

(2) What themes are missing from the above description that you think are important?

  • - wlougee wlougee Oct 25, 2016 Attention to issues of metadata and repositories, digital preservation.
  • - wlougee wlougee Oct 25, 2016 Expertise needed to handle different types of data. UMN Libraries is lead in Sloan Foundation-funded Data Curation Network, bringing 6 institutions together to share expertise for different data types and address issues of shared curation workflow. See: https://sites.google.com/site/datacurationnetwork
  • Issues pertaining to the ethical use of data, the quality of data that is found during information retrieval, metadata interoperability, data ownership. - shorisyl shorisyl Oct 27, 2016
  • Completely new skills and services will be needed / will arise in the information and library sector: data scientists, staff for the visualisation of data, community manager with other big data users etc. - franziska.regner franziska.regner Nov 1, 2016 - Laurents.Sesink Laurents.Sesink Nov 12, 2016- vacekrae vacekrae Nov 12, 2016 but for many libraries these are not completely new - kristi-thompson kristi-thompson Nov 12, 2016
  • Also a new emphasis on visual literacies for students as part of their explorations of information literacy. - mstephens7 mstephens7 Nov 10, 2016
  • Text and datamining expertise - franziska.regner franziska.regner Nov 11, 2016
  • Deep learning and machine learning understanding (not expertise) - franziska.regner franziska.regner Nov 11, 2016
  • exertise in statistics, data visualisation, data interpretation - franziska.regner franziska.regner Nov 11, 2016
  • copyright expertise, licensing expertise - franziska.regner franziska.regner Nov 11, 2016
  • The library as a data provider for research questions - franziska.regner franziska.regner Nov 11, 2016 - andreas.kirstein andreas.kirstein Nov 11, 2016 - kristi-thompson kristi-thompson Nov 12, 2016
  • Data Privacy - franziska.regner franziska.regner Nov 11, 2016 - andreas.kirstein andreas.kirstein Nov 11, 2016
  • The tools (hardware and software) that libraries need to have to process and visualize the data - vacekrae vacekrae Nov 12, 2016
  • Data retention policies and copyright issues (education and practices/policies) - vacekrae vacekrae Nov 12, 2016 - kristi-thompson kristi-thompson Nov 12, 2016
  • As the mention in many of the comments of data management plans, collections etc. make clear, it isn't just big data that is a concern but small and medium sized datasets as well - surveys of a couple thousand people, a few years' of soil sampling data, a gigabyte or two of recorded bird calls and their codings... it's the diversity and complexity as well as the size that make these a challenge especially with new funding requirements for management. Documentation / metadata is a serious problem. Cataloging, documentation, integration into the scholarly record, citation practices... our ability to make data available and usable as a scholarly resource is decades behind where we are with books, articles and even most grey literature. The standards and tools are still being developed and fought over - both formal metadata standards such as DDI and informal standards of practice like how and when to cite or otherwise acknowledge secondary data reuse.

(3) What do you see as the potential impact of this technology on academic and research libraries?

  • - wlougee wlougee Oct 25, 2016 Opportunities for experiential learning; Libraries' role in providing data sets for curricular use - franziska.regner franziska.regner Nov 1, 2016
  • Capacity building with respect to repositories, metadata services, research support, and digital scholarship production support. - shorisyl shorisyl Oct 27, 2016 - franziska.regner franziska.regner Nov 1, 2016- vacekrae vacekrae Nov 12, 2016
  • Libraries need to continue safeguarding privacy and providing an ethical framework to working with personal data - shorisyl shorisyl Oct 27, 2016 - franziska.regner franziska.regner Nov 1, 2016
  • Assisting library clients in assessing what is missing from a big data set is important in order to identify unintended bias or to examine correlation vs causation. The increasing requirement for researchers to make their data available, where public funding has been used, increases the potential for data sets to be reused, remixed or combined, increasing the importance of the metadata and the ability to interpret the context within which the data was collected before reusing it or relying on it. This is a key area of teaching and learning support that libraries will need to address. - mylee.joseph mylee.joseph Nov 8, 2016 - kristi-thompson kristi-thompson Nov 12, 2016
  • I agree about privacy but we should also be mindful of the balance between privacy and data shared openly, i.e. life tracking data, geo-mobile data, etc. - mstephens7 mstephens7 Nov 10, 2016 Finding the balance between privacy and using data to improve services is a tricky thing. - vacekrae vacekrae Nov 12, 2016
  • Funders demand FAIR data. Libraries can support the FAIRification of data. - Laurents.Sesink Laurents.Sesink Nov 12, 2016

(4) Do you have or know of a project working in this area?



Please share information about related projects in our Horizon Project sharing form.