Topic / Target
>EINFRA-1-2014 Managing, preserving and computing with big research data
Development and deployment of integrated, secure, permanent, on-demand service-driven, privacy-compliant and sustainable e-infrastructures incorporating advanced computing resources and software are essential in order to increase the capacity to manage, store and analyse extremely large, heterogeneous and complex datasets, including text mining of large corpora.
These e-infrastructures need to provide services cutting across a wide-range of scientific communities and addressing a diversity of computational requirements, legal constraints and requirements, system and service architectures, formats, types, vocabularies and legacy practices of scientific communities that generate, analyse and use the data. Scope: Proposals should address at least one of the first five (5) activities, or activities 6, 7 or 8 individually.
Proposers are encouraged to leverage on prior work on open prototype services and to use discoverable service catalogues, common APIs, service-level agreements (SLAs) and transparent billing.
(1) Establishing a federated pan-European data e-infrastructure to provide cost-effective and interoperable solutions for data management and long term preservation;
(2) Services to ensure the quality and reliability of the e-infrastructure, including certification mechanisms for repositories and certification services to test and benchmark capabilities in terms of resilience and service continuity of e-infrastructures;
(3) Federating institutional and, if possible, private data management and curation tools and services used across or at some point of the full data lifecycle, including approaches for identification of open data sources and data collected with sensitive or restricted access features;
(4) Large scale virtualisation of data/compute centre resources to achieve on-demand compute capacities, improve flexibility for data analysis and avoid unnecessary costly large data transfers.
(5) Development and adoption of a standards-based computing platform (with open software stack) that can be deployed on different hardware and e-infrastructures (such as clouds providing infrastructure-as-a-service (IaaS), HPC, grid infrastructures…) to abstract application development and execution from available (possibly remote) computing systems;
(6) Support to the evolution of EGI (European Grid Infrastructure) towards a flexible compute/data infrastructure capable of federating and enabling the sharing of resources of any kind (public or private, grid or cloud, etc.) in order to offer computing and storage services to the whole European scientific community;
(7) Proof of concept and prototypes of data infrastructure-enabling software (e.g. for databases and data mining) for extremely large or highly heterogeneous data sets scaling to zetabytes and trillion of objects;
(8) Enable the creation of a platform and infrastructure for mining text aggregated from different sources/publishers that responds to the needs of users (researchers);
- Increased availability of scientific data for scientific communities independently of them having already embraced or not e-science; this will be measured by cross-border data traffic over the research networks in Europe as a proxy.
- Better optimisation of the use of IT equipment for research.
- Avoiding lock-in to particular hardware or software platforms in the development of science.
- Scientific communities embrace storage and computing infrastructures as state-of-the-art services become available and the learning curve for their use becomes less steep; this will be measured by the storage capacity available for pan-European use as well as by the number of users of EGI and other production e-infrastructures in this area.
- Through the development of large pooled and interoperable text mining infrastructures, efficiencies of scale will reduce the overall costs, and more open licensing schemes will spread the use of such licenses and boost the exchange of text mining resources and practices.