Java Web Crawler

Web crawler, sometimes called a spider, is an Internet bot that systematically browses the World Wide Web, typically for the purpose of Web indexing (web spidering).

 

Web crawlers can be created in java using multiple consumer and producer threads along with queues to store the html of a webpage.

This crawler was made as a project  for my systems programming class.

Consumer thread:

[code language="java"]
public class ConsumerThread extends Thread {private static HashMap<String, Integer> userKeywords = new HashMap<>();
 private static HashMap<String, Integer> keywordPages = new HashMap<>();
 private static int totalKeywords;
 private volatile boolean done = false;/**
 * Defines the run method of a ConsumerThread which gets the html of a webpage
 * from the pageQueue and finds the links throughout the page as well as the
 * user keywords
 */
 /*
 * Defines the run method of a ConsumerThread which gets the html of a webpage
 * from the pageQueue and finds the links throughout the page as well as the
 * user keywords
 */
 public void run() {
 while (!done) {// get page text
 Document pageText = PageQueue.getNextPage();// do something with link...
 Elements links = pageText.select("a[href]");
for (Element link : links) {String url = link.absUrl("href");
LinkQueue.addLink(url);}
// find instances of the user entered keywords 
for (String key : userKeywords.keySet()) {String[] brokenUpPage = pageText.toString().split(key);userKeywords.put(key, userKeywords.get(key) + (brokenUpPage.length) - 1);}
 //finds pages with keyword
 for (String key : keywordPages.keySet()) {String[] brokenUpPage = pageText.toString().split(key);if (brokenUpPage.length > 1) {
 keywordPages.put(key, keywordPages.get(key) + 1);
 totalKeywords += (brokenUpPage.length - 1);
 }}}
 }

Google Pledges $1 Billion

Google plans to donate $1 billion to nonprofits through its charitable arm, Google.org, with the aim of addressing the gap between the skills required by modern companies and the skills that are taught in schools. Google said it was donating $10 million to Goodwill Industries, for example, for digital job training programs. Company employees also will volunteer one million hours at those nonprofits.

Much like a political campaign, Google will go on the road to spread the message about its new program, it said. In the coming months, company officials will make stops in Indianapolis; Oklahoma City; Lansing, Mich.; and Savannah, Ga.

Google is not the only big tech company that has gone on a charm offensive in recent months. Under fire from President Trump for producing most of its devices in China, Apple announced in May that it was creating a $1 billion fund to invest in advanced manufacturing in the United States. Amazon, another frequent target of Mr. Trump, said in January that it was planning to hire 100,000 new employees over the next 18 months.

And Sheryl Sandberg, Facebook’s high-profile chief operating officer, made the rounds on Capitol Hill this week, offering explanations to congressional leaders about her company’s role in last year’s election.

Google has long been a leader in research on artificial intelligence, and Mr. Pichai has made it the centerpiece of his company’s plans. Google and its parent company, Alphabet, are leaning on A.I. technology in all manner of products, from new smartphones to self-driving cars. But with that automation comes disruption and concern that breakthroughs may upend entire industries and eliminate millions of jobs, particularly in trucking and transportation.

Pittsburgh is one traditional manufacturing city, however, that could gain from advances in artificial intelligence. Researchers at Carnegie Mellon University in the city have been on the cutting edge of work on autonomous vehicles, and other big tech outfits like Uber and Amazon also have offices there.

Google has been put under a harsh light over the last year, an unaccustomed spot for a company that has long been one of the tech industry’s most admired outfits and is still considered one of the best places in the world to work.

The European Union levied the largest antitrust fine in history against Google for unfairly favoring its own services over those of its rivals. A group of former employees sued the company, accusing it of paying women less than men. It also suffered an exodus of advertisers from its video platform, YouTube, after evidence that ads appeared next to extremist videos.

More recently, it has been mired in the spreading investigation of Russian interference in the election. Company representatives are expected to speak at House and Senate Intelligence Committee hearings on Nov. 1, along with Facebook and Twitter.

Most Popular Programming Languages of 2017

It may take a while, but over time, the popularity of various programming languages does rise and fall significantly. That’s why New Relic (where you can find the original post) takes the time to survey the programming language landscape annually, checking up on enduring favorites and keeping an eye out for emerging trends. We look at a variety of metrics and sources, ranging from job listings to activity on developer forums.

This year, Java remains the programming language skill most in demand among employers — as it was last year — while JavaScript continues to reign atop the lists of languages most commonly used by coders. But changes are rumbling beneath the top spots, as popular frameworks and technological trends like the Internet of Things (IoT) raise the profile of certain specialized languages, while so-called “functional languages” also draw increasing interest.

Microservices Tilt the Landscape

Another driving force behind these trends is the growing adoption of a microservices architecture. “Eight years ago, monolithic programs were popular,” says Neha Duggal, product manager for New Relic APM. “Now people are moving toward microservices, breaking down the notion of an application into manageable pieces.” This has spurred interest in a lot of different programming languages, including newer languages like Scala, Kotlin, and Apache Groovy that run on a Java Virtual Machine (JVM). Microservices normally use asynchronous communications, Neha says, and the newer languages often handle such communications better than Java itself. The trend is further reinforced by the growing popularity of frameworks like Eclipse Vert.x, which let you use any JVM-based language.

Which Programming Languages Do Employers Want?

With that in mind, let’s take a closer look at the programming language skills employers are searching for right now. We asked job-search site Indeed to extract the language skills most often appearing in job postings for software development roles from July 2016 through June 2017. We also looked at IEEE Spectrum’s listing of the languages most in demand in the June 2017 listings on the Dice and CareerBuilder job sites:

Java tops both lists, and “Java continues to be the most in-demand programming language, year over year,” notes Doug Gray, Indeed’s senior vice president of engineering. “This is not surprising, as Java is stable and great for scaling, which is especially important with larger companies.”

The lists are remarkably similar, in fact, with only a few entries — .NET, SQL, Node.js, Swift, some C variants — showing up on one and not the other. And even those minor differences may reflect variations in definitions and methodology rather than a real difference in popularity. Node.js does not appear on the IEEE list, for example, perhaps because the JavaScript runtime framework is included under JavaScript. SQL, #6 at Indeed, comes in at #16 on the IEEE list, likely because it’s not represented in that list’s Web or Mobile categories. (The IEEE rankings let you sort by various filters.) And some might argue that JavaScript is more a scripting language than a programming language.

Just as notable, the 2017 lists include eight of the top ten employer-requested languages from last year, indicating that programming job requirements have remained relatively stable. The popularity of Node.js, for example, could also explain JavaScript’s jump from fourth place on the Indeed list last year to second this year. A Forrester report last November found that the platform is being used for many purposes, including IoT innovation, and it’s the most commonly used framework according to this year’s Stack Overflow Developer Survey. Similarly, while .NET jumped into the #3 spot on the Indeed list, room may have been created by the consolidation of C and C++ into a single entry, compared to C#, C++, and C all individually making the top 10 last year.

Just because the top 10 languages are pretty well established, however, doesn’t mean the landscape is frozen. “R and Python are on the rise with the convergence of IoT and Machine Learning,” observes Kellet Atkinson, director of marketing for developer community provider DZone.com. “‘Big Data’ is the top search term on our job board, and Python is in the top five languages being written about.” The IEEE rankings identify Python as the language growing most rapidly, with R coming in at #8.

What Programming Languages Do Coders Use?

To see what languages developers are most interested in using, let’s look at RedMonk’s Programming Language Rankings, which draw on code-pulls from GitHub combined with discussions on Stack Overflow. In addition, Stack Overflow conducts its own Developer Survey of what developers are actually using:

Look familiar? It makes sense that the languages developers are using match up relatively closely with what employers are looking for. (One exception: TypeScript makes an appearance on the Stack Overflow list. Since it compiles to plain JavaScript, it could be riding on that #1 language’s coattails.)

What Programming Languages Do Coders Like?

More interesting, perhaps, is what languages developers actually prefer. According to Todd West, a Lead Software Engineer at New Relic, engineers tend to favor the languages they happened to learn first, as well as those that are easy to use and offer both cutting-edge innovation and fast performance. Stack Overflow addresses this issue with an annual survey of the most-loved, most-wanted, and most-dreaded languages:

Apparently, developers not only use JavaScript/TypeScript, Python, and C#, they actually like them. As for Rust, perhaps developers like the systems programming language’s speed, ability to prevent segfaults, and guarantee of thread safety. (We have nothing to say about why 1980-vintage Smalltalk is still so adored.)

What Programming Languages Are on the Rise?

As CEO of coding bootcamp Bloc, Roshan Choxi pays attention to what developers are talking about on forums like Hacker News and Reddit’s r/programming, as well as what incoming students show interest in. “The one new trend that stands out from the past year is the increasing influence of functional programming,” he says, referring to languages like Haskell, Erlang, Elixir, Elm, and Clojure. (Some of those choices show up in the lists above.)

“It seems to be an answer to a lot of the common problems developers face today: concurrency, state management, and reliability … JavaScript may have something to do with this,” explains Roshan. “While it’s not exactly a functional language, it does emphasize functions as first-class citizens, and there are several popular projects that allow you to adopt functional programming features into your JavaScript code. For example, Redux introduced the concept of message passing and TypeScript allows you to plug in static typing, both of which are common in functional languages.”

Polyglot Programming Still a Winner

Another trend we identified last year is still going strong: working with more than one language, as individual polyglot programmers and/or organizations leverage different languages for different purposes. “We are increasingly seeing organizations using more than one language in their ecosystem,” says New Relic’s Neha Duggal. “You might have different teams writing microservices, and each team can pick a language they’re comfortable using for that service.”

DZone’s Kellet Atkinson has noticed the same thing: “With the push towards DevOps, there has been a swell in the idea of the ‘full-stack developer’ who knows multiple languages.” DZone launched a job board a couple of months ago, he adds, and “a lot of the jobs are asking for full-stack developers, and a lot of people in our audience are trending toward considering themselves full-stack developers.”

What Does It All Mean?

At first glance, it may seem like little has changed in the world of programming languages over the past year. But don’t let the fact that familiar players still dominate the top of the popularity, usage, and desirability charts obscure the fact that newer languages and growing trends are now significantly affecting the choices made by individual programmers and software development teams.

So while proficiency with Java, JavaScript, .NET, Python, various iterations of C, Ruby, PHP, HTML, and CSS remain safe choices, forward-looking developers may also want to familiarize themselves with newer contenders like Haskell, Erlang, Elixir, Elm, and Clojure. Looking ahead, knowing more than one language seems set to be increasingly valuable, as more employers look for developers who are comfortable working across the full stack.

‘You can’t wash your hands of this.’ Watch University of Washington computer science chair sound off on Facebook and Russia

Ed Lazowska, chair of the University of Washington’s Computer Science and Engineering School, wants Facebook and other big tech companies to take responsibility for enabling Russian meddling into the 2016 election.

The longtime fixture of the Seattle tech community expressed his frustration with social media giants in interviews with GeekWire and Bloomberg Technology at the GeekWire Summit this week.

“You can’t wash your hands of this,” he said. “It’s a difficult technical problem because, at the scale they operate at, detecting fake news, detecting ad placements by third-parties you wouldn’t want to be placing ads, is a problem.”

Facebook, in particular, has earned Lazowska’s ire. He said, unlike Microsoft, “Facebook has nothing to sell except what they know about you and ads.”

Facebook, Twitter, and Google are all under federal scrutiny for the role their platforms played in Russian meddling in the election. All three companies have been asked to testify before the House Intelligence Committee on Nov. 1. In a recent interview with Axios, Facebook COO Sheryl Sandberg said, “It’s not just that we apologize. We’re angry, we’re upset, but what we really owe the American people is determination,” to address foreign meddling.

“They’ve got to be able to do better than this,” Lazowska told GeekWire this week. “There’s lots of smart people working there so the challenge is, step one, take responsibility and step two, dig in and solve the problems.”