Glossary

Data glossary

Name Definition

Data

Source

Data is a set of values of qualitative or quantitative variables.[...]While the concept of data is commonly associated with scientific research, data is collected by a huge range of organizations and institutions, ranging from businesses (e.g., sales data, revenue, profits, stock price), governments (e.g., crime rates, unemployment rates, literacy rates) and non-governmental organizations (e.g., censuses of the number of homeless people by non-profit organizations).

Data Model

Source

A Data Model is an abstract model that organizes elements of data and standardizes how they relate to one another and to properties of the real world. For instance, a data model may specify that a data element representing a car comprise a number of other elements which in turn represent the color, size and owner of the car.

Data Quality

Source

Data quality refers to the level of quality of data. There are many definitions of data quality but data are generally considered high quality if “they are fit for their intended uses in operations, decision making and planning.”

Open Data

Source

Open data is the idea that some data should be freely available to everyone to use and republish as they wish, without restrictions from copyright, patents or other mechanisms of control. The goals of the open data movement are similar to those of other “open” movements such as open source, open hardware, open content, and open access.

Query

Source

A web search query is a query that a user enters into a web search engine to satisfy his or her information needs. Web search queries are distinctive in that they are often plain text or hypertext with optional search-directives (such as “and”/”or” with “-” to exclude). They vary greatly from standard query languages, which are governed by strict syntax rules as command languages with keyword or positional parameters.

Semantic Web

Source

A representation in two (or possibly three) dimensions of the semantic relationships between and among terms and the concepts they represent; (ANSI/NISO Z39.19-200x). The Semantic Web provides a common framework that allows data to be shared and reused across application, enterprise, and community boundaries.

Taxonomy

Source

A collection of controlled vocabulary terms organized into a hierarchical structure.

Unstructured Data

Source

Data that is more free-form, such as multimedia files, images, sound files, or unstructured text. Unstructured data does not necessarily follow any format or hierarchical sequence, nor does it follow any relational rules. Unstructured data refers to masses of (usually) computerized information which do not have a data structure which is easily readable by a machine.

OpenDataSoft glossary

Name Definition
Assets Assets are the graphical elements uploaded to the platform. Assets can be images or fonts, they can be used on custom pages.
Catalog The catalog is a register of all the dataset you have on your platform. The collection of datasets is organized and can be browsed by a full text search and filtered using the datasets’ characteristics

Chart

Source

A chart, also called a graph, is a graphical representation of data, in which “the data is represented by symbols, such as bars in a bar chart, lines in a line chart, or slices in a pie chart”. A chart can represent tabular numeric data, functions or some kinds of qualitative structure and provides different info.
Chart builder Chart builder is the chart building solution of OpenDatasoft. With Chart Builder, you can choose a visualization type, choose data to display and customize X and Y axes and colors

Choropleth map

Source

A choropleth map is a thematic map in which areas are shaded or patterned in proportion to the measurement of the statistical variable being displayed on the map, such as population density or per-capita income.
Console (API) The API console allows people to interact with the application programmable interface. The consoles offers a range of parameters to input to view the different API responses.
Data schema (dataset) The data schema describes the properties attached to each fields of the records in a dataset. Data schema includes the field’s name, type and example.

Dataset

Source

A dataset is an organized collection of data. The most basic representation of a dataset is data elements presented in tabular form. Each column represents a particular variable. Each row corresponds to a given value of that column’s variable. A dataset may also present information in a variety of non-tabular formats, such as an extended mark-up language (XML) file, a geospatial data file, or an image file.
Description (dataset) The description is a text attached to the dataset, it allows users to understand the data inside the dataset. A good description helps users find relevant information

Document

Source

A file containing Unstructured and/or Semi-Structured Data Resources. A discrete and unique electronic aggregation of data produced with the intent of conveying information. All data within a document may be in the same format (e.g., text), or a document may be a composite that consists of sets of data in a variety of formats (e.g., MS Word files containing embedded graphics).

File format

Source

A file format is a standard way that information is encoded for storage in a computer file. It specifies how bits are used to encode information in a digital storage medium. File formats may be either proprietary or free and may be either unpublished or open.
Harvester A harvester is an automated process to fetch data on a remote portal. The harvester fetches the datasets on a remote portal and automatically copies them to the platform. The datasets fetched can be queried with parameters.
Keyword (dataset) Keywords help understand the data content of the dataset. They are used to filter, search and browse datasets by content. Keywords are non constrained and can be typed freely.
License (dataset) The License refers to the permissions attached to a dataset regarding conditions of use, reproducibility and monetary use
Map builder Map builder is the map building solution of OpenDatasoft. With Map Builder, you can quickly add datasets to a geographical view and customize the colors, data clustering methods and tooltips.

Metadata

Source

Metadata are “data that provide information about other data”. Two types of metadata exist: structural metadata and descriptive metadata. Structural metadata are data about the containers of data. Descriptive metadata use individual instances of application data or the data content.
Publisher (dataset) The publisher is the entity responsible of the data dissemination either to the general public in Open Data or to targeted users

Record

Source

A record (also called struct or compound data) is a basic data structure. A record is a collection of fields, possibly of different data types, typically in fixed number and sequence
Reuse A reuse is a voluntary declaration of dataset use in another context (a map, an application, a website) by anyone
Subdomain A subdomain is a child domain of a parent domain. A parent domain can distribute or collect content to these child domains.
Tags Tags (or keywords) help users discover your dataset and should include terms that would be used by technical and non-technical users.
Theme (dataset) A theme is a dataset topic, it helps categorize dataset into bigger categories. Themes are constrained and are to be chosen in a list.

Technical glossary

Name Definition

API

Source

An application programming interface, which is a set of definitions of the ways one piece of computer software communicates with another. It is a method of achieving abstraction, usually (but not necessarily) between higher-level and lower-level software.

API Key

Source

An application programming interface key (API key) is a code passed in by computer programs calling an application programming interface (API) to identify the calling program, its developer, or its user to the Web site.

Basic Auth

Source

HTTP Basic authentication (BA) implementation is the simplest technique for enforcing access controls to web resources because it doesn’t require cookies, session identifiers, or login pages; rather, HTTP Basic authentication uses standard fields in the HTTP header, obviating the need for handshakes.

CKAN (Comprehensive Knowledge Archive Network)

Source

CKAN stands for Comprehensive Knowledge Archive Network, an open source data management system that is the basis of the Data.gov catalog, as well as the open data catalogs of approximately 50 data hubs around the world.
Connector A connector is a computer program specifically designed to connect to a data source. A data source can be another Open Data portal or a FTP server.

CSV (comma separated value)

Source

A comma separated value (CSV) file is a computer data file used for implementing the organizational tool of the Comma Separated List. The CSV file is used for the digital storage of data structured in a table of lists form. Each line in the CSV file corresponds to a row in the table. Within a line, fields are separated by commas and each field belongs to one table column.

CSW (Catalog Service for the Web)

Source

Catalog Service for the Web (CSW), sometimes seen as Catalog Service - Web, is a standard for exposing a catalog of geospatial records in XML on the Internet (over HTTP). The catalog is made up of records that describe geospatial data (e.g. KML), geospatial services (e.g. WMS), and related resources.

Database

Source

A database is an organized collection of data. It is the collection of schemas, tables, queries, reports, views, and other objects.

DKAN (Drupal based CKAN)

Source

DKAN is an open-source data management platform

DNS

Source

The Domain Name System (DNS) is a hierarchical decentralized naming system for computers, services, or any resource connected to the Internet or a private network.

Endpoint

Source

An end point indicates a specific location for accessing a service using a specific protocol and data format.

EPSG (European Petroleum Survey Group)

Source

The EPSG Geodetic Parameter Dataset is a structured dataset of Coordinate Reference Systems and Coordinate Transformations [...] The geographic coverage of the data is worldwide, but it is stressed that the dataset does not and cannot record all possible geodetic parameters in use around the world.

FTP

Source

The File Transfer Protocol (FTP) is a standard network protocol used to transfer computer files between a client and server on a computer network.

Geocoding

Source

Geocoding is the computational process of transforming a postal address description to a location on the Earth’s surface

HTML (HyperText Markup Language)

Source

HyperText Markup Language (HTML) is the standard markup language for creating web pages and web applications. With Cascading Style Sheets (CSS), and JavaScript, it forms a triad of cornerstone technologies for the World Wide Web. Web browsers receive HTML documents from a web server or from local storage and render them into multimedia web pages.

HTTP (HyperText Transfer Protocol)

Source

The primary method used to convey information on the World Wide Web. HTTP is a request/response protocol between clients and servers.

JSON (JavaScript Object Notation)

Source

JSON (JavaScript Object Notation) is an open-standard format that uses human-readable text to transmit data objects consisting of attributeÐvalue pairs. It is the most common data format used for asynchronous browser/server communication (AJAJ), largely replacing XML which is used by AJAX.

KML (Keyhole Markup Language)

Source

Keyhole Markup Language (KML) is an XML notation for expressing geographic annotation and visualization within Internet-based, two-dimensional maps and three-dimensional Earth browsers.

KMZ (Keyhole Markup Zipped)

Source

KML files are very often distributed in KMZ files, which are zipped files with a .KMZ extension. When a KMZ file is unzipped, a single doc.kml is found along with any overlay and icon images referenced in the KML and any network-linked KML files.

LDAP (Lightweight Directory Access Protocol)

Source

The Lightweight Directory Access Protocol is an open, vendor-neutral, industry standard application protocol for accessing and maintaining distributed directory information services over an Internet Protocol (IP) network. Directory services play an important role in developing intranet and Internet applications by allowing the sharing of information about users, systems, networks, services, and applications throughout the network.

Machine-Readable File

Source

Refers to information or data that is in a format that can be easily processed by a computer without human intervention while ensuring no semantic meaning is lost.

Mapbox

Source

Mapbox is a large provider of custom online maps for websites such as Foursquare, Pinterest, Evernote, the Financial Times, The Weather Channel and Uber Technologies. Since 2010, it has rapidly expanded the niche of custom maps, as a response to the limited choice offered by map providers such as Google Maps.

OAuth

Source

OAuth is an open standard for authorization, commonly used as a way for Internet users to log in to third party websites using their Google, Facebook, Microsoft, Twitter, One Network, etc. accounts without exposing their password. Generally, OAuth provides to clients a “secure delegated access” to server resources on behalf of a resource owner.

oData

Source

Open Data Protocol (OData) is an open protocol which allows the creation and consumption of queryable and interoperable RESTful APIs in a simple and standard way

Open Source Software

Source

Computer software that is available in source code form: the source code and certain other rights normally reserved for copyright holders are provided under an open-source license that permits users to study, change, improve and at times also to distribute the software.Open source software is very often developed in a public, collaborative manner.
Parser (or extractor) A parser is a computer program that takes a file as input, processes and indexes it in order for the platform or people to perform complex operations on them.

RDF (Resource Description Framework )

Source

The Resource Description Framework (RDF) is a family of World Wide Web Consortium (W3C) specifications originally designed as a metadata data model. It has come to be used as a general method for conceptual description or modeling of information that is implemented in web resources, using a variety of syntax notations and data serialization formats. It is also used in knowledge management applications.

REST (Representational state transfer)

Source

In computing, representational state transfer (REST) is an architectural style used for web development. Systems and sites designed using this style aim for fast performance, reliability and the ability to scale (to grow and easily support extra users). To achieve these goals, developers work with reusable components that can be managed and updated without affecting the system as a whole while it is running.

RSS (Rich Site Summary)

Source

RSS (Rich Site Summary; originally RDF Site Summary; often called Really Simple Syndication) uses a family of standard web feed formats to publish frequently updated information: blog entries, news headlines, audio, video. An RSS document (called “feed”, “web feed”,or “channel”) includes full or summarized text, and metadata, like publishing date and author’s name.
RSS Feed URL for an RSS feed that provides access to the dataset.

SAML (Security Assertion Markup Language)

Source

Security Assertion Markup Language (SAML) is an XML-based, open-standard data format for exchanging authentication and authorization data between parties, in particular, between an identity provider and a service provider.

Shapefile

Source

The shapefile format is a popular geospatial vector data format for geographic information system (GIS) software. A shapefile stores non-topological geometry and attribute information for the spatial features in a dataset. The geometry for a feature is stored as a shape comprising a set of vector coordinates. Shapefiles can support point, line, and area features.

SOAP (Simple Object Access Protocol)

Source

SOAP (Simple Object Access Protocol) is a message-based protocol based on XML for accessing services on the Web. It employs XML syntax to send text commands across the Internet using HTTP. SOAP is similar in purpose to the DCOM and CORBA distributed object systems, but is more lightweight and less programming-intensive. Because of its simple exchange mechanism, SOAP can also be used to implement a messaging system.

SQL (Structured Query Language)

Source

SQL (Structured Query Language) is a special-purpose programming language designed for managing data held in a relational database management system (RDBMS), or for stream processing in a relational data stream management system (RDSMS).

SSL certificate

Source

A SSL certificate is an electronic document used to secure connections between websites. The certificate includes information about the key, information about its owner’s identity, and the digital signature of an entity that has verified the certificate’s contents are correct.

Swagger

Source

The OpenAPI Specification (originally known as the Swagger Specification) is a specification for machine-readable interface files for describing, producing, consuming, and visualizing RESTful web services. A variety of tools can generate code, documentation and test cases given an interface file

Tiles

Source

Tiles are individually requested image files over the internet that are seemlessly joined to create a map

Token

Source

A token is piece of data that is used in network communications (often over HTTP) to identify a session, a series of related message exchanges. On the platform, tokens allow you to connect to external services.

TSV (Tab Separated Values)

Source

A simple text format for a database table. Each record in the table is one line of the text file. Each field value of a record is separated from the next by a tab stop character. It is a form of the more general delimiter-separated values format.

Turtle (Terse RDF Triple Language)

Source

Turtle (Terse RDF Triple Language) is a format for expressing data in the Resource Description Framework (RDF) data model with a syntax similar to SPARQL. RDF, in turn, represents information using “triples”, each of which consists of a subject, a predicate, and an object. Each of those items is expressed as a Web URI.

Web Service

Source

A Web service is a service offered by an electronic device to another electronic device, communicating with each other via the World Wide Web. In a Web service, Web technology such as HTTP, originally designed for human-to-machine communication, is utilized for machine-to-machine communication, more specifically for transferring machine readable file formats such as XML and JSON.

WFS (Web Feature Service)

Source

Web Feature Service Interface Standard (WFS) provides an interface allowing requests for geographical features across the web using platform-independent calls

WSDL (Web Services Description Language)

Source

The Web Services Description Language is an XML-based interface definition language that is used for describing the functionality offered by a web service.

XML (Extensible Markup Language)

Source

XML (Extensible Markup Language) is a general-purpose specification for creating custom markup languages. It is classified as an extensible language, because it allows the user to define the mark-up elements. XML’s purpose is to aid information systems in sharing structured data especially via the Internet, to encode documents, and to serialize data.