To preserve billions of web pages, blogs and e-books which appear on the UK web domain, the British Library will begin to harvest the internet.
To preserve billions of web pages, blogs and e-books which appear on the UK web domain, the British Library will begin to "harvest" the internet. The library hopes to document the entire domain, and could eventually build a database holding every public Tweet or Facebook page.
"If you want a picture of what life is like today in the UK you have to look at the web," explained project leader Lucie Burgess.
"We have already lost a lot of material, particularly around events such as the 7/7 London bombings or the 2008 financial crisis."
"That material has fallen into the digital black hole of the 21st century because we haven't been able to capture it."
"Most of that material has already been lost or taken down. The social media reaction has gone," she added.
A three-month operation to harvest an initial 4.8 million websites -- or one billion web pages -- will begin on Friday, the first step in the ambitious project.
Advertisement
The project, which has so far cost £3 million, was sparked by a change in regulations which now allow a small number of librairies to hold digital content without seeking copyright clearance.
Advertisement
"The regulations now coming into force make digital legal deposit a reality, and ensure that the Legal Deposit Libraries themselves are able to evolve - collecting, preserving and providing long-term access to the profusion of cultural and intellectual content appearing online or in other digital formats," he added.
Source-AFP