Big Data, the new big thing of the industry, is comprised of structured and unstructured data
Knowing the difference of the two data types is paramount in order to learn the technology and process of analyzing data.
Structured Data
comprised of clearly defined data types whose patterns make easily searchable
- resides in relational databases (RDBMS), which stores length-delineated data such as phone numbers, SSN, ZIP codes.
- is stored in text-format by a pre-defined data models
- generated by humans or machines
Unstructured Data
simply "everything else" - is internally structured however not in a pre-defined data models or schema making it difficult to search
- resides in application, NoSQL databases
- no specific pre-defined model and maybe in text, video, images, sound or other formats
- generated by humans or machines
Semi-Structured Data
maintains internal tags and markings identifying seperate data elements
Examples
- Markup Language (XML) : set of document encoding rules in human / machine readable format
- JSON : covered by braces and paired with key-value
- emails : common example of a semi-structured data type
Differences in Chart