What is Data Science
Datascience is introduced in 1980s.It is an interdisciplinary field, its true foundations are in statistics, mathematics, computer science and business.
Defination Of Data Science
Data Science can be defined as the study of data and works mainly on large volumes of data coming from different types of sources such as financial logs, multimedia files, marketing forms, sensors and instruments, text files and many more.
We Will Discuss a use case
Data Science plays an important role in effective decision making.
Example:Self-driving or intelligent cars
An intelligent vehicle collects data in real-time from its surroundings through different sensors like radars, cameras, and lasers to create a visual (map) of their surroundings. Based on this data and advanced Machine Learning algorithm, it takes crucial driving decisions like turning, stopping, speeding, etc.
It is also used for data transformation and to create business and IT strategies.
Data science life cycle
- Capture: Data acquisition, data entry, signal reception, data extraction
- Maintain: Data warehousing, data cleansing, data staging, data processing, data architecture
- Process: Data mining, clustering/classification, data modeling, data summarization
- Analyze: Exploratory/confirmatory, predictive analysis, regression, text mining, qualitative analysis
- Communicate: Data reporting, data visualization, business intelligence, decision making
Data science tools
Data scientists must know how to build and run code to create models. The most popular programming languages and open source tools that support data scientists for pre-built statistical, machine learning and graphics capabilities are:
- R: R is the most popular open source programming language among data scientists for developing statistical computing and graphics. R provides a broad variety of libraries and tools for cleansing and prepping data, creating visualizations, and training and evaluating machine learning and deep learning algorithms. It’s also widely used among data science scholars and researchers.
- Python is commonly used for developing websites and software, task automation, data analysis, and data visualization. Since it's relatively easy to learn, Python has been adopted by many non-programmers such as accountants and scientists, for a variety of everyday tasks, like organizing finances. Several Python libraries support data science tasks, including Numpy for handling large dimensional arrays, Pandas for data manipulation and analysis, and Matplotlib for building data visualizations.