The Enron Corpus is a large database of over 600,000 emails generated by 158 employees[1] of the Enron Corporation and acquired by the Federal Energy Regulatory Commission during its investigation after the company's collapse.[2]
This makes it ideal, because:
- It's public
- It's relational
- It's relatively numerous rows, but still sits inside a 60 GB VM
(Another really good candidate, if you have the VM / hard drive space is the Stack Overflow data set)
No comments:
Post a Comment