A Data Lake is a centralised repository that allows storage of structured and unstructured data at any scale. Data can be stored as-is, without having to first structure it.
A Data Lake can include structured data from relational databases (rows and columns), semi-structured data (CSV, logs, XML, JSON), unstructured data (emails, documents, PDFs) and binary data (images, audio, video). It can be established "on premises" (within an organisation's data centers) or "in the cloud".
The ability to harness more data, from more sources, in less time, and the ability to empower users to collaborate and analyse data in different ways, leads to better and faster decision making.
A Data Warehouse is a database optimised to analyse relational data from operational business applications. The data structure and schema are predefined to optimise the data warehouse for rapid reporting and analysis.
A Data Lake is different because it combines relational data from business applications and non-relational data from mobile apps, IoT devices, and social media. The data is stored in its original format, without defining a structure or schema.
Many organisations see the benefits of Data Lakes, and are expanding their traditional Data Warehouse with Data Lake functionality, in order to discover new information models through the application of data science.
Ask Eric